1       BBCP

 

 

bbcp [ options ] [ srcspec [ . . . ] ] snkspec

 

srcspec:            [[sid@]shost:]sourcefile

 

snkspec:            [tid@]thost:]target

 

options:            [ advanced ]  -c [lvl]  -C fname  -f  -I fname  -k 

       

        -l logfn  -m mode  -p  -P sec  -v  -w wsz  --

 

-------------------------------------------------------------------

 

advanced: -a [dir]  -b blkf  -B bfsz  -d path  -D  -e  -i fname 

 

        -L lopts[@lurl]  -q qos  -s strms -S srccmd  -T trgcmd 

 

        -t tlim  -V  -W wsz  -x rate  -z

 

lopts:            a | b | c | I | o | r | w | x | [ lopts ]

 

lurl:            file://path/filename | x-netlog://host:port |

 

        x-syslog://localhost

 

 

Function

Securely and quickly copy a file from source to destination. Please refer to known problems (see usage notes) for a list of current defects and limitations.

 

Parameters

 

sid

specifies the ssh loginid for the source host. The default is to use your current loginid.

 

shost

specifies the name of the host that holds the source file. By default, the local host is assumed.

 

sourcefile

specifies the name of the file to be copied. Any number of source files from any number of hosts may be specified on the command line. If the -I option is specified, no source files need to be specified on the command line.

 

tid

specifies the ssh loginid for the target host. The default is to use your current loginid.

 

thost

specifies the name of the host to which the file is to be copied. By default, the local host is assumed.

 

target

specifies the name the target location of the file to be copied. If a single source file is specified, target may be a filename or a directory. If more than one source file is specified, target must be the name of a directory.

 

Options

 

-c [lvl]

compresses the data prior to sending it across the network. Specify an integer value from 1 through 9 for lvl. A value of 1 gives the best speed while a value of 9 gives the best compression. The default value is 1. If lvl is omitted, -c may not be the last option on the command line

 

-C fname

specifies the name of the configuration file. The configuration file is processed when it is encountered on the command line. Refer to the section “Configuring New Defaults” for more information about configuration files.

 

-f    forces the copy by erasing the target prior to copying the source file. By default, if the target already exists for the source file, the copy fails. The –f option is mutually exclusive with the –a option.

 

-I fname

includes a list of source file specifications from the file identified by fname. Each new-line terminated record in fname must contain a single source file specification. If –I is specified, you need not specify any source files on the command line.

 

-k    keeps any partially created target files. The –k option allows full recovery after a copy failure. By default, partial files are removed after a copy fails.

 

-l logfn

logs standard error to the indicated file, logfn, By default, standard error output is written to the terminal.

 

-m mode

sets the final mode for the target file. Specify for mode a 3- or 4-digit mode in octal. The default mode is 0644.

 

-p                preserves the source file’s mode, group name, access time, and modification time. That is, the target file’s mode, group, atime, and mtime values are set to match that of the corresponding source file.

 

-P sec

produces progress messages every sec seconds. Specify a sec value no less than 1 second.

 

-v      produces additional output during execution.

 

-w wsz

sets the size of the disk I/O buffers. Specify for wsz a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively. The TCP/IP socket buffer is set to wsz plus 32 bytes to account for network overhead. This effectively sets the TCP/IP window size for the associated connection. The default is 64k.

 

--    is a null option. Use to prevent an option with an optional argument from being the last option on the command line.

 

Advanced Options

 

-a [ dir ] 

appends data to the end of the target file if the target is found to be incomplete due to a previously failed copy operation. The optional dir specifies the directory on the target host where checkpoint information is to be written (the default is home/.bbcp). The –a option is mutually exclusive with the –f option. Refer to the section “Resuming Failed Copies” for more details. If dir is omitted, -a may not be the last option on the command line.

 

-b blkf

specifies the read blocking factor. That is, blkf data blocks are always read from disk and then queued for sending across the network. The maximum is determined by the maximum number of scatter/gather buffers allowed in a readv() system call. By default, blkf is set to equal the –s strms value. If –c is specified, blkf determines the number of disk I/O.

 

 

-B bfsz

specifies the disk I/O buffer size when compression is enabled. Normally, blkf times wsz amount of data are read, while wsz amount of data is written, at one time. When compression is enabled, bfsz amount of data is read and written at one time. Specify for bfsz a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively.The default is 1m.

 

-d path

specifies source relative addressing. Each relative srcspec is prefixed by path. When the file is copied to the target, then the destination path will be snkspec/srcspec. That is, the relative path in srcspec will be created on the target host relative to snkspec and then the file will be copied. Refer to the section “Multi-Target Copying” for more information.

 

-D    turns on debugging.

 

-e    enables extensive error checking by calculating an MD5 checksum for each block of data sent. The receiving end validates the checksum to ensure that the data was not altered while in transit.

 

-i fname

specifies the name of the ssh identity file if one has been specifically created for bbcp. The identity filename, prefixed by –i, is included in the ssh command line when starting the source and target nodes.

 

-L lopts[@lurl]

enables detailed logging of actions via the NetLogger interface. The lopts specify what is to be logged while lurl determines how the information is logged. For lopts specify one or more of the following (at least one of c, i, o, r, w, and x):

a – append to data file   o   – log network writes

b – buffer information in memory r   – log disk reads

c – log data compression    w   – log disk writes

i  – log network reads   x      log data expansion

 

If lurl is not specified, the logging interfaces uses the value of NETLOGGER_DEST as the lurl value. Specify one of three destinations protocols:

file   -   data is written to the file identified by path/filename

x-netlog   -   data is sent to host listening on port

x-syslog   -   data is sent to the system log on the local host

 

-q qos

specifies the quality of service to be used. This is router-implementation dependent and may ignored. Specify a value between 0 and 255, inclusive.

 

-s strms

sets the number of parallel network streams to be used for the transfer. Specify a strms value from 1 to 32, inclusive. The default is 4.

 


-S srccmd

is the command to be used to start bbcp on the source host. The default is “/usr/local/bin/ssh %I –l %U %H bbcp”. See the usage notes for more information.

 

-T srccmd

is the command to be used to start bbcp on the target host. The default is “/usr/local/bin/ssh %I –l %U %H bbcp”. See the usage notes for more information.

 

-t tlim

is the maximum amount of time that the copy may take before it is aborted. The time limit applies to each source host regardless of the number of files that host supplies.  Specify a number greater than zero and optionally suffixed by s (the default), m, or h for seconds, minutes, and hours, respectively. The default is to not apply any time limit.

 

-V      produces even more output than –v allows, including detailed transfer speed statistics.

 

-W wsz

like –w, sets the size of all I/O buffers, including the TCP/IP socket buffer. This option is identical to the –w option except that disk buffers are set to wsz less 32 to account for network 32 bytes of overhead.

 

-x rate

sets the maximum transfer rate. Specify for rate a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively. Data is clocked out from the source at the specified rate per second.

 

-z    uses reverse connection protocol. Refer to the section “Dealing With Firewalls” for more information.

 

Success

            The program exists with a status code of 0.

 

Failure

The program exits with a non-zero status code.

 


Notes

1)      A list of known problems is detailed on the following web page: “http://www.slac.stanford.edu/~abh/bbcp/bbcp_bugs.html”.

2)      Files are copied in the order specified. To minimize start-up and shutdown time, adjacent files are grouped by source host and treated as a copy set (i.e., a related group of files). Avoid inter-mixing different source locations. That is, always specify all the required files from one source location before specifying files from another location.

3)      The destination file system must have sufficient space to comfortably hold all of the source files. If sufficient space does not exist at the start of a copy set, the copy is terminated.

4)      While the copy is in progress, the target file has 0200 as its mode (i.e., owner write-only). The mode is changed only after the copy succeeds.

5)      When you specify the –c option, bbcp uses zlib, written by Jean-loup Gailly and Mark Adler, to compress and decompress the file.

6)      The –w option is used to optimize disk transfer operations since you specify the preferred disk I/O buffer and bbcp calculates the required network buffer. This effectively sets the minimum window size. The -W option is used to optimize network transfer operations since you specify the preferred network I/O buffer and bbcp calculates the required disk buffer. This effectively sets the maximum window size.

7)      By default, bbcp uses /usr/local/bin/ssh for authentication on every host that was specified in the source and target specifications. The rules attending to normal ssh use always apply to bbcp. When in doubt, simply ssh to the host in question to validate your ability to copy files to or from that host.

8)      bbcp executes a copy of itself on the source node as “bbcp SRC” and a mirror copy on the sink node as “bbcp SNK”. Because the commands are well known, you may restrict ssh usage to exactly these commands when a password-less key-file is used to gain access to a host.

9)      Because bbcp invokes itself without an absolute path, you must make sure that bbcp can be found in one of the directories listed in your PATH environmental variable. Otherwise, you must specify where bbcp can be found (see the next note).

10)  The –S and –T options allows you to specify different commands to start bbcp on the source and sink nodes. Refer to the section “Modifying Startup” for details on how to change the default location of bbcp and ssh.

11)  Refer to http://www-didc.lbl.gov/NetLogger/ for complete information on NetLogger.

12)  You can easily GRID enable bbcp from the security standpoint by specifying GSI-OpenSSH as the authentication and launch vehicle using the -S and -T options.

 


1.1       Resuming Failed Copies

 

You can resume failed copies in most cases by consistently using the –a option. When –a is specified, the following occurs:

 

1)      If the target file does not already exist, a new copy is initiated by

a.      creating a checkpoint record to pair the source and target files together,

b.      transmitting all source bytes to the target location, and

c.      upon successful transmission of the source file, erasing the checkpoint file.

2)      If the target file exists and is identical in size to the source file and a copy checkpoint record is not found for the file, the copy is assumed to have completed normally and the file is skipped.

3)      If the target file is larger than the source file, is smaller in size and a checkpoint record cannot be found, or if the checkpoint record does not pair the source and target files together, bbcp terminates with an error.

4)      Otherwise, the copy is resumed by appending all un-transmitted source file bytes to the target file.

 

The –k option maximizes bbcp’s ability to resume failed copies. If –k is not specified and an error occurs, bbcp erases the partially transmitted file. The –a option is still useful without –k, however, bbcp will merely skip over fully copied files. Rarely will bbcp be able to resume copying where it left off. The –k option forces partially completed files to remain on disk so that a partial copy can be resumed after the fault condition that terminated the copy is corrected.

 

Proper resumption of partially transmitted files relies on a checkpoint record. By default, this record is written in the command owner’s (i.e., the user running bbcp) home directory in the “.bbcp” subdirectory. This subdirectory is automatically created if it does not already exist. The file names in this subdirectory have the format

 

bbcp.srchost.trgid.trgfn

 

Where srchost is the DNS name of the host that holds the source file, trgid is the unique identification of the target location, and trgfn is the name of the target file. The contents of the file uniquely identify the source file at srchost. Proper pairing requires that the conditions that created the checkpoint file are still true at the time the copy is resumed. This essentially means that the copy cannot be resumed if any changes have occurred to the source file or if the source or target files have changed location since the copy was terminated.

 

Users with home directories in AFS may wish to change the default location for checkpoint files, especially should they run in batch-mode without an AFS token. Refer to the section “Configuring New Defaults” on how to set a new default location.

 

1.2       Multi-Target Copying

 

You may use bbcp to copy source files to multiple destinations. The –d option enables source relative addressing that, in turn, allows multi-target copying. The following steps are taken when you specify –d path:

 

1)      Each relative source file specification (i.e., one that does not start with a slash) is prefixed by path. The source file must be found at the resulting location.

2)      The file is transferred to the sink (i.e., target) host along with it’s associated relative path.

3)      The sink host creates the source relative path, if it does not exist, prefixed with the path in the sink specification.

4)      The file is then created with a file name identical to the source file name.

 

For example,

 

bbcp –d /usr/abh/data dir1/data1 dir2/data2 batch:/usr/temp

 

would copy

 

/usr/abh/data/dir1/data1 to /usr/temp/dir1/data1

/usr/abh/data/dir2/data2 to /usr/temp/dir2/data2

 

The directories dir1 and dir2 are automatically created starting at path /usr/temp on host batch should they not exist.

 

You may mix relative paths with absolute paths. Absolute source paths are not prefixed by the –d path and are copied to the directory identified by the sink specification.

 

 


1.3       Modifying Startup

 

At times, you may need to specify different commands to start bbcp on the source node, as well as the sink node. The –S and –T options allow you to do this. You may also specify the default –S and –T options using a configuration file. See “Configuring New Defaults” for more information.

 

Because certain information needs to be substituted in the command line, bbcp defines certain character sequences to indicate the location of a substitution. These are:

 

%I       - substituted by the –i fname (i.e., ssh identify file option) should one exist,

%H       - the source or target host name, and

%U       - the source or target user name.

 

For instance, the command

 

bbcp -S ‘/bin/ssh %I –l %U %H /bin/bbcp’ /tmp/fn abh@host:/tmp

 

would start bbcp on the source node using the command

 

/bin/ssh –l abh host /usr/bin/bbcp SRC

 

Since the ssh identity file was not specified, the %I was deleted. If the identity file were specified as

 

 

bbcp -S ‘/bin/ssh %I –l %U %H /bin/bbcp’ –i foo /tmp/fn abh@host:/tmp

 

then the command used to start bbcp on the source node would be

 

/bin/ssh –i foo –l abh host /usr/bin/bbcp SRC

 

Identical rules apply to the –T option which specifies the command to start bbcp on the sink (i.e., target) node.

 


1.4       Dealing With Firewalls

 

bbcp is a peer-to-peer application. Mainly, this means that copies of bbcp on the source and sink nodes appear to be both client as well as server applications . This may not be possible at some sites because of firewall restrictions. Specifically, some installation may prohibit incoming TCP/IP connections at arbitrary ports.

 

Normally, bbcp source nodes will connect to their counterpart running on the target node. If the target host prohibits incoming connections, the copy will fail. However, should the source host allow arbitrary connections, you can specify the –z option. This option reverses the connection protocol so that the bbcp sink node will always try to connect to its counterpart running on the source host.

 

When the source and target nodes prohibit arbitrary connections, you will need assistance of an administrator at either node. By default, bbcp checks the /etc/services file for the existence of two services: bbcpfirst and bbcplast. The bbcpfirst service identifies the starting port number and bbcplast identifies the ending port number that can be used for incoming connections. When neither service name can be found, bbcp resorts to using an arbitrary port number. If the services are found, bbcp restricts its port usage to one of the ports in the indicated range.

 

Ask the administrator at the source or target nodes to allow a range of well-known port numbers to be used for incoming connections (i.e., allowed to pass through the firewall). This will require that the administrator register these port numbers in the /etc/services file using the names bbcpfirst and bbcplast (the default names can be changed). Make sure that at least 8 port numbers exist in the range (more if possible). If restricted port access is only allowed in the source site, you must specify the –z option when invoking bbcp.

 

 


1.5       Configuring New Defaults

 

When starting, bbcp checks the environmental variable bbcp_CONFIGFN. When this variable is set, the contents are used as the location of the configuration file. Otherwise, bbcp looks to see if the file .bbcp.cf exists in the home directory. If it does, then this file is used as the initial configuration file. A configuration file may also be specified on the command line using the –C option. Command line configuration files are processed when they are encountered. Thus, any option specified prior to –C may be overridden by the configuration file and the file’s values may be overridden by subsequent options. The –C option, when specified, should be the first option on the command line.

 

Each line in the configuration file may contain an option-value pair. The option name is identical to that specified on the command line (e.g., -a,-b, -c, etc.). The value is the value, if any, that would be specified along with the corresponding option.  The only difference between options specified on the command line and those specified in the configuration file is that each option must be on a separate line and option values must not be quoted.

 

It is critical to remember that bbcp is a peer-to-peer application. Therefore, it can have up to three different execution locations at the same time: the host that initiated the bbcp command (i.e., agent), the host that holds the source data (i.e., source), and the host that is to receive the source data (i.e., target). In order to simplify the management of this environment, the configuration file is only read on the agent’s host (i.e., the host that initiated the copy) and the values are transmitted to the source and target hosts.

 

1.6       Problem Reports & Enhancement Requests

 

Please direct all problem reports, modifications, and requests for enhancements to:

 

Andrew Hanushevsky abh@stanford.edu

 

 


1.7       Downloading

 

First, please read the legal notice (see below). Use of this software implies that you have read and agreed to all of the terms and conditions for use.

 

If you have access to AFS, you can find the platform-specific bbcp executable at

 

/afs/slac.stanford.edu/public/software/bbcp/bbcp

 

Otherwise, download (use the right button) one or more of the following bbcp executables:

 

AIX 4.x (not yet available)

 

HP/UX (not yet available)

 

OFS/1 (not yet available)

 

Redhat Linux 6.2 (Zoot 2.2.19-6)

 

Redhat Linux 7.2 (Enigma 2.4.18-3)

 

Solaris 5.6

 

Solaris 5.7

 

Solaris 5.8

 

The above are actual executable programs to retrieve and store a file. Each program is compiled for the indicated operating system. The program may or may not work in other versions of the same operating system. Should you run into trouble or wish to extend the range of operating systems available, feel free to download the source and send back any required modification.

 


Legal Notice

 

Copyright © 2002, Board of Trustees of the Leland Stanford, Jr. University.

Produced under contract DE-AC03-76-SF00515 with the US Department of Energy.

All rights reserved.

 

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

 

a.       Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

b.      Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

c.       Neither the name of the Leland Stanford, Jr. University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.