join - relational database operator
join [ -a file_number | -v file_number ][-e string][-o list][-t char] [-1 field][-2 field] file1 file2 join [-a file_number][-e string][-j field][-j1 field][-j2 field] [-o list...][-t char][-t char] file1 file2
The join utility will perform an equality join on the files file1 and file2. The joined files will be written to the standard output.The join field is a field in each file on which the files are compared. By default, join writes one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line by default will consist of the join field, then the remaining fields from file1, then the remaining fields from file2. This format can be changed by using the -o option (see below). The -a option can be used to add unmatched lines to the output. The -v option can be used to output only unmatched lines.
By default, the files file1 and file2 should be ordered in the collating sequence of sort -b on the fields on which they are to be joined, by default the first in each line. All selected output will be written in the same collating sequence.
The default input field separators will be blank characters. In this case, multiple separators will count as one field separator, and leading separators will be ignored. The default output field separator will be a space character.
The field separator and collating sequence can be changed by using the -t option (see below).
If the input files are not in the appropriate collating sequence, the results are unspecified.
The join utility supports the XBD specification, Utility Syntax Guidelines . The obsolescent version does not follow the utility argument syntax guidelines: the -j1 and -j2 options are multi-character options and the -o option takes multiple arguments.The following options are supported:
- -a file_number
- Produce a line for each unpairable line in file file_number, where file_number is 1 or 2, in addition to the default output. If both -a 1 and -a 2 are specified, all unpairable lines will be output.
- -e string
- Replace empty output fields in the list selected by -o with the string string.
- -j field
- Equivalent to: -1 field -2 field.
- -j1 field
- Equivalent to: -1 field.
- -j2 field
- Equivalent to: -2 field.
- -o list
- Construct the output line to comprise the fields specified in list, each element of which has one of the following two forms:
The elements of list are either comma- or blank-separated, as specified in Guideline 8 of the XBD specification, Utility Syntax Guidelines . The fields specified by list will be written for all selected output lines. Fields selected by list that do not appear in the input will be treated as empty output fields. (See the -e option.) Only specifically requested fields are written. The list must be a single command line argument. However, as an obsolescent feature, the argument list can be multiple arguments on the command line.
- file_number.field, where file_number is a file number and field is a decimal integer field number
- 0 (zero), representing the join field.
- -t char
- Use character char as a separator, for both input and output. Every appearance of char in a line will be significant. When this option is specified, the collating sequence should be the same as sort without the -b option.
- -v file_number
- Instead of the default output, produce a line only for each unpairable line in file_number, where file_number is 1 or 2. If both -v 1 and -v 2 are specified, all unpairable lines will be output.
- -1 field
- Join on the fieldth field of file 1. Fields are decimal integers starting with 1.
- -2 field
- Join on the fieldth field of file 2. Fields are decimal integers starting with 1.
The following operands are supported:
- file1
- file2
- A pathname of a file to be joined. If either of the file1 or file2 operands is "-", the standard input is used in its place.
The standard input will be used only if the file1 or file2 operand is "-". See the INPUT FILES section.
The input files must be text files.
The following environment variables affect the execution of join:
- LANG
- Provide a default value for the internationalisation variables that are unset or null. If LANG is unset or null, the corresponding value from the implementation-dependent default locale will be used. If any of the internationalisation variables contains an invalid setting, the utility will behave as if none of the variables had been defined.
- LC_ALL
- If set to a non-empty string value, override the values of all the other internationalisation variables.
- LC_COLLATE
- Determine the locale of the collating sequence join expects to have been used when the input files were sorted.
- LC_CTYPE
- Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files).
- LC_MESSAGES
- Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.
- NLSPATH
- Determine the location of message catalogues for the processing of LC_MESSAGES .
Default.
The join utility output will be a concatenation of selected character fields. When the -o option is not specified, the output will be:
"%s%s%s\n", <join field>, <other file1 fields>, <other file2 fields>If the join field is not the first field in a file, the <other file fields> for that file are:
<fields preceding join field>, <fields following join field>When the -o option is specified, the output format will be:
"%s\n", <concatenation of fields>
where the concatenation of fields is described by the -o option, above.For either format, each field (except the last) will be written with its trailing separator character. If the separator is the default (blank characters), a single space character will be written after each field (except the last).
Used only for diagnostic messages.
None.
None.
The following exit values are returned:
- 0
- All input files were output successfully.
- >0
- An error occurred.
Default.
Pathnames consisting of numeric digits or of the form string.string should not be specified directly following the -o list.
The -o 0 field essentially selects the union of the join fields. For example, given file phone:and file fax:!Name Phone Number Don +1 123-456-7890 Hal +1 234-567-8901 Yasushi +2 345-678-9012
(where the large expanses of white space are meant to each represent a single tab character), the command:!Name Fax Number Don +1 123-456-7899 Keith +1 456-789-0122 Yasushi +2 345-678-9011
would produce:join -t "<tab>" -a 1 -a 2 -e '(unknown)' -o 0,1.2,2.2 phone fax
!Name Phone Number Fax Number Don +1 123-456-7890 +1 123-456-7899 Hal +1 234-567-8901 (unknown) Keith (unknown) +1 456-789-0122 Yasushi +2 345-678-9012 +2 345-678-9011
The obsolescent -j options and the multi-argument -o option may be withdrawn in a future issue.
awk, comm, sort, uniq.