bbcp [ options ] [ srcspec [ . . . ] ] snkspec
srcspec: [[sid@]shost:]sourcefile
snkspec: [tid@]thost:]target
options: [
advanced ] -c [lvl] -C fname -f
-I fname -k
-l logfn -m mode -p
-P sec -v -w wsz --
-------------------------------------------------------------------
advanced: -a [dir] -b blkf -B bfsz -d path -D
-e -i fname
-L lopts[@lurl] -q qos -s strms -S srccmd -T trgcmd
-t tlim -V
-W wsz -x rate -z
lopts: a
| b | c | I | o | r | w | x |
[ lopts ]
lurl: file://path/filename | x-netlog://host:port |
x-syslog://localhost
Function
Securely and quickly copy a file from source to destination. Please refer to known problems (see usage notes) for a list of current defects and limitations.
sid
specifies the ssh
loginid for the source host. The default is to use your current loginid.
shost
specifies the name
of the host that holds the source file. By default, the local host is assumed.
sourcefile
specifies
the name of the file to be copied. Any number of source files from any number
of hosts may be specified on the command line. If the -I option is
specified, no source files need to be specified on the command line.
tid
specifies the ssh
loginid for the target host. The default is to use your current loginid.
thost
specifies the name
of the host to which the file is to be copied. By default, the local host is
assumed.
target
specifies the name
the target location of the file to be copied. If a single source file is
specified, target may be a filename or a directory. If more than one
source file is specified, target must be the name of a directory.
Options
-c [lvl]
compresses the data prior to sending it across the network. Specify an integer value from 1 through 9 for lvl. A value of 1 gives the best speed while a value of 9 gives the best compression. The default value is 1. If lvl is omitted, -c may not be the last option on the command line
-C fname
specifies the name of the configuration file. The configuration file is processed when it is encountered on the command line. Refer to the section “Configuring New Defaults” for more information about configuration files.
-f forces the copy by erasing the target prior to copying the source file. By default, if the target already exists for the source file, the copy fails. The –f option is mutually exclusive with the –a option.
-I fname
includes a list of source file specifications from the file identified by fname. Each new-line terminated record in fname must contain a single source file specification. If –I is specified, you need not specify any source files on the command line.
-k keeps any partially created target files. The –k option allows full recovery after a copy failure. By default, partial files are removed after a copy fails.
-l logfn
logs standard error to the indicated file, logfn, By default, standard error output is written to the terminal.
-m mode
sets the final mode for the target file. Specify for mode a 3- or 4-digit mode in octal. The default mode is 0644.
-p preserves
the source file’s mode, group name, access time, and modification time. That
is, the target file’s mode, group, atime, and mtime values are set to match
that of the corresponding source file.
-P sec
produces progress messages every sec seconds. Specify a sec value no less than 1 second.
-v produces additional output during execution.
-w wsz
sets the size of the disk I/O buffers. Specify for wsz a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively. The TCP/IP socket buffer is set to wsz plus 32 bytes to account for network overhead. This effectively sets the TCP/IP window size for the associated connection. The default is 64k.
-- is a null option. Use – to prevent an option with an optional argument from being the last option on the command line.
Advanced
Options
-a [
dir ]
appends data to the end of the target file if the target is found to be incomplete due to a previously failed copy operation. The optional dir specifies the directory on the target host where checkpoint information is to be written (the default is home/.bbcp). The –a option is mutually exclusive with the –f option. Refer to the section “Resuming Failed Copies” for more details. If dir is omitted, -a may not be the last option on the command line.
-b blkf
specifies the read blocking factor. That is, blkf data blocks are always read from disk and then queued for sending across the network. The maximum is determined by the maximum number of scatter/gather buffers allowed in a readv() system call. By default, blkf is set to equal the –s strms value. If –c is specified, blkf determines the number of disk I/O.
-B bfsz
specifies the disk I/O buffer size when compression is enabled. Normally, blkf times wsz amount of data are read, while wsz amount of data is written, at one time. When compression is enabled, bfsz amount of data is read and written at one time. Specify for bfsz a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively.The default is 1m.
-d path
specifies source relative addressing. Each relative srcspec is prefixed by path. When the file is copied to the target, then the destination path will be snkspec/srcspec. That is, the relative path in srcspec will be created on the target host relative to snkspec and then the file will be copied. Refer to the section “Multi-Target Copying” for more information.
-D turns on debugging.
-e enables extensive error checking by calculating an MD5 checksum for each block of data sent. The receiving end validates the checksum to ensure that the data was not altered while in transit.
-i fname
specifies the name of the ssh identity file if one has been specifically created for bbcp. The identity filename, prefixed by –i, is included in the ssh command line when starting the source and target nodes.
-L lopts[@lurl]
enables detailed logging of actions via the NetLogger interface. The lopts specify what is to be logged while lurl determines how the information is logged. For lopts specify one or more of the following (at least one of c, i, o, r, w, and x):
a – append to data file o – log network writes
b – buffer information in memory r – log disk reads
c – log data compression w – log disk writes
i – log network reads x – log data expansion
If lurl is not specified, the logging interfaces uses the value of NETLOGGER_DEST as the lurl value. Specify one of three destinations protocols:
file - data is written to the file identified by path/filename
x-netlog - data is sent to host listening on port
x-syslog - data is sent to the system log on the local host
-q qos
specifies the quality of service to be used. This is router-implementation dependent and may ignored. Specify a value between 0 and 255, inclusive.
-s strms
sets the number of parallel network streams to be used for the transfer. Specify a strms value from 1 to 32, inclusive. The default is 4.
-S srccmd
is the command to be used to start bbcp on the source host. The default is “/usr/local/bin/ssh %I –l %U %H bbcp”. See the usage notes for more information.
-T srccmd
is the command to be used to start bbcp on the target host. The default is “/usr/local/bin/ssh %I –l %U %H bbcp”. See the usage notes for more information.
-t tlim
is the maximum amount of time that the copy may take before it is aborted. The time limit applies to each source host regardless of the number of files that host supplies. Specify a number greater than zero and optionally suffixed by s (the default), m, or h for seconds, minutes, and hours, respectively. The default is to not apply any time limit.
-V produces even more output than –v allows, including detailed transfer speed statistics.
-W wsz
like –w, sets the size of all I/O buffers, including the TCP/IP socket buffer. This option is identical to the –w option except that disk buffers are set to wsz less 32 to account for network 32 bytes of overhead.
-x rate
sets the maximum transfer rate. Specify for rate a value no less that 1024 (i.e., 1k). Numbers suffixed by k, m, or g are multiplied by 210, 220, or 230, respectively. Data is clocked out from the source at the specified rate per second.
-z uses reverse connection protocol. Refer to the section “Dealing With Firewalls” for more information.
Success
The program exists with a status code of 0.
Failure
The program
exits with a non-zero status code.
Notes
1) A list of known problems is detailed on the following web page: “http://www.slac.stanford.edu/~abh/bbcp/bbcp_bugs.html”.
2) Files are copied in the order specified. To minimize start-up and shutdown time, adjacent files are grouped by source host and treated as a copy set (i.e., a related group of files). Avoid inter-mixing different source locations. That is, always specify all the required files from one source location before specifying files from another location.
3) The destination file system must have sufficient space to comfortably hold all of the source files. If sufficient space does not exist at the start of a copy set, the copy is terminated.
4) While the copy is in progress, the target file has 0200 as its mode (i.e., owner write-only). The mode is changed only after the copy succeeds.
5) When you specify the –c option, bbcp uses zlib, written by Jean-loup Gailly and Mark Adler, to compress and decompress the file.
6) The –w option is used to optimize disk transfer operations since you specify the preferred disk I/O buffer and bbcp calculates the required network buffer. This effectively sets the minimum window size. The -W option is used to optimize network transfer operations since you specify the preferred network I/O buffer and bbcp calculates the required disk buffer. This effectively sets the maximum window size.
7) By default, bbcp uses /usr/local/bin/ssh for authentication on every host that was specified in the source and target specifications. The rules attending to normal ssh use always apply to bbcp. When in doubt, simply ssh to the host in question to validate your ability to copy files to or from that host.
8) bbcp executes a copy of itself on the source node as “bbcp SRC” and a mirror copy on the sink node as “bbcp SNK”. Because the commands are well known, you may restrict ssh usage to exactly these commands when a password-less key-file is used to gain access to a host.
9)
Because bbcp invokes itself without an absolute
path, you must make sure that bbcp can be found in one of the
directories listed in your PATH environmental variable. Otherwise, you must
specify where bbcp can be found (see the next note).
10)
The –S and –T options allows you to specify
different commands to start bbcp on the source and sink nodes. Refer to
the section “Modifying Startup” for details on how to change the default
location of bbcp and ssh.
11)
Refer to http://www-didc.lbl.gov/NetLogger/
for complete information on NetLogger.
12)
You can easily GRID enable bbcp from the security
standpoint by specifying GSI-OpenSSH as
the authentication and launch vehicle using the -S and -T
options.
You can resume failed copies in most cases by consistently using the –a option. When –a is specified, the following occurs:
1) If the target file does not already exist, a new copy is initiated by
a. creating a checkpoint record to pair the source and target files together,
b. transmitting all source bytes to the target location, and
c. upon successful transmission of the source file, erasing the checkpoint file.
2)
If the target file exists and is identical in size to the
source file and a copy checkpoint record is not found for the file, the copy is
assumed to have completed normally and the file is skipped.
3)
If the target file is larger than the source file, is
smaller in size and a checkpoint record cannot be found, or if the checkpoint
record does not pair the source and target files together, bbcp
terminates with an error.
4)
Otherwise, the copy is resumed by appending all
un-transmitted source file bytes to the target file.
The –k option maximizes bbcp’s ability to resume failed copies. If –k is not specified and an error occurs, bbcp erases the partially transmitted file. The –a option is still useful without –k, however, bbcp will merely skip over fully copied files. Rarely will bbcp be able to resume copying where it left off. The –k option forces partially completed files to remain on disk so that a partial copy can be resumed after the fault condition that terminated the copy is corrected.
Proper resumption of partially transmitted files relies on a checkpoint record. By default, this record is written in the command owner’s (i.e., the user running bbcp) home directory in the “.bbcp” subdirectory. This subdirectory is automatically created if it does not already exist. The file names in this subdirectory have the format
bbcp.srchost.trgid.trgfn
Where srchost is the DNS name of the host that holds the source file, trgid is the unique identification of the target location, and trgfn is the name of the target file. The contents of the file uniquely identify the source file at srchost. Proper pairing requires that the conditions that created the checkpoint file are still true at the time the copy is resumed. This essentially means that the copy cannot be resumed if any changes have occurred to the source file or if the source or target files have changed location since the copy was terminated.
Users with home directories in AFS may wish to change the default location for checkpoint files, especially should they run in batch-mode without an AFS token. Refer to the section “Configuring New Defaults” on how to set a new default location.
You may use bbcp to copy source files to multiple destinations. The –d option enables source relative addressing that, in turn, allows multi-target copying. The following steps are taken when you specify –d path:
1) Each relative source file specification (i.e., one that does not start with a slash) is prefixed by path. The source file must be found at the resulting location.
2) The file is transferred to the sink (i.e., target) host along with it’s associated relative path.
3) The sink host creates the source relative path, if it does not exist, prefixed with the path in the sink specification.
4) The file is then created with a file name identical to the source file name.
For example,
bbcp –d
/usr/abh/data dir1/data1 dir2/data2 batch:/usr/temp
would copy
/usr/abh/data/dir1/data1 to /usr/temp/dir1/data1
/usr/abh/data/dir2/data2 to /usr/temp/dir2/data2
The directories dir1 and dir2 are automatically created starting at path /usr/temp on host batch should they not exist.
You may mix relative paths with absolute paths. Absolute source paths are not prefixed by the –d path and are copied to the directory identified by the sink specification.
At times, you may need to specify different commands to start bbcp on the source node, as well as the sink node. The –S and –T options allow you to do this. You may also specify the default –S and –T options using a configuration file. See “Configuring New Defaults” for more information.
Because certain information needs to be substituted in the command line, bbcp defines certain character sequences to indicate the location of a substitution. These are:
%I - substituted by the –i fname (i.e., ssh identify file option) should one exist,
%H - the source or target host name, and
%U - the source or target user name.
For instance, the command
bbcp
-S ‘/bin/ssh %I –l %U %H /bin/bbcp’ /tmp/fn abh@host:/tmp
would start bbcp on the source node using the command
/bin/ssh
–l abh host /usr/bin/bbcp SRC
Since the ssh identity file was not specified, the %I was deleted. If the identity file were specified as
bbcp
-S ‘/bin/ssh %I –l %U %H /bin/bbcp’ –i foo /tmp/fn abh@host:/tmp
then the command used to start bbcp on the source node would be
/bin/ssh
–i foo –l abh host /usr/bin/bbcp SRC
Identical rules apply to the –T option which specifies the command to start bbcp on the sink (i.e., target) node.
bbcp is a peer-to-peer application. Mainly, this means that copies of bbcp on the source and sink nodes appear to be both client as well as server applications . This may not be possible at some sites because of firewall restrictions. Specifically, some installation may prohibit incoming TCP/IP connections at arbitrary ports.
Normally, bbcp source nodes will connect to their counterpart running on the target node. If the target host prohibits incoming connections, the copy will fail. However, should the source host allow arbitrary connections, you can specify the –z option. This option reverses the connection protocol so that the bbcp sink node will always try to connect to its counterpart running on the source host.
When the source and target nodes prohibit arbitrary connections, you will need assistance of an administrator at either node. By default, bbcp checks the /etc/services file for the existence of two services: bbcpfirst and bbcplast. The bbcpfirst service identifies the starting port number and bbcplast identifies the ending port number that can be used for incoming connections. When neither service name can be found, bbcp resorts to using an arbitrary port number. If the services are found, bbcp restricts its port usage to one of the ports in the indicated range.
Ask the administrator at the source or target nodes to allow a range of well-known port numbers to be used for incoming connections (i.e., allowed to pass through the firewall). This will require that the administrator register these port numbers in the /etc/services file using the names bbcpfirst and bbcplast (the default names can be changed). Make sure that at least 8 port numbers exist in the range (more if possible). If restricted port access is only allowed in the source site, you must specify the –z option when invoking bbcp.
When starting, bbcp checks the environmental variable bbcp_CONFIGFN. When this variable is set, the contents are used as the location of the configuration file. Otherwise, bbcp looks to see if the file .bbcp.cf exists in the home directory. If it does, then this file is used as the initial configuration file. A configuration file may also be specified on the command line using the –C option. Command line configuration files are processed when they are encountered. Thus, any option specified prior to –C may be overridden by the configuration file and the file’s values may be overridden by subsequent options. The –C option, when specified, should be the first option on the command line.
Each line in the configuration file may contain an option-value pair. The option name is identical to that specified on the command line (e.g., -a,-b, -c, etc.). The value is the value, if any, that would be specified along with the corresponding option. The only difference between options specified on the command line and those specified in the configuration file is that each option must be on a separate line and option values must not be quoted.
It is critical to remember that bbcp is a peer-to-peer application. Therefore, it can have up to three different execution locations at the same time: the host that initiated the bbcp command (i.e., agent), the host that holds the source data (i.e., source), and the host that is to receive the source data (i.e., target). In order to simplify the management of this environment, the configuration file is only read on the agent’s host (i.e., the host that initiated the copy) and the values are transmitted to the source and target hosts.
Please direct all problem reports, modifications, and requests for enhancements to:
Andrew Hanushevsky abh@stanford.edu
First, please read the legal notice (see below). Use of this software implies that you have read and agreed to all of the terms and conditions for use.
If you have access to AFS, you can find the platform-specific bbcp executable at
/afs/slac.stanford.edu/public/software/bbcp/bbcp
Otherwise, download (use the right button) one or more of the following bbcp executables:
AIX
4.x (not yet available)
HP/UX
(not yet available)
OFS/1
(not yet available)
Redhat Linux
6.2 (Zoot 2.2.19-6)
Redhat Linux
7.2 (Enigma 2.4.18-3)
The above are actual executable programs to retrieve and store a file. Each program is compiled for the indicated operating system. The program may or may not work in other versions of the same operating system. Should you run into trouble or wish to extend the range of operating systems available, feel free to download the source and send back any required modification.
Legal
Notice
Copyright © 2002, Board of Trustees of the Leland Stanford, Jr. University.
Produced under contract DE-AC03-76-SF00515 with the US Department of
Energy.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
c. Neither the name of the Leland Stanford, Jr. University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.