sort - sort, merge or sequence check text files
sort [-m][-o output][-bdfinru][-t char][-k keydef]...[-z recsz] [file...] sort -c [-bdfinru][-t char][-k keydef]...[-z recsz][file...] sort [-mu][-o output][-bdfir][-t char][+pos1[-pos2]]...[-z recsz] [file...] sort -c[-u][-bdfinr][-t char][+pos1[-pos2]]...[-z recsz][file]
The sort utility performs one of the following functions:
- Sorts lines of all the named files together and writes the result to the specified output.
- Merges lines of all the named (presorted) files together and writes the result to the specified output.
- Checks that a single input file is correctly presorted.
Comparisons are based on one or more sort keys extracted from each line of input (or the entire line if no sort keys are specified), and are performed using the collating sequence of the current locale.
The sort utility supports the XBD specification, Utility Syntax Guidelines , except that the notation +pos1 -pos2 uses a non-standard prefix and multi-digit option names in the obsolescent versions, the -o output option is recognised after a file operand as an obsolescent feature in both versions where the -c option is not specified, and the -k keydef option should follow the -b, -d, -f, -i, -n and -r options.The following options are supported:
- -c
- Check that the single input file is ordered as specified by the arguments and the collating sequence of the current locale. No output is produced; only the exit code is affected.
- -m
- Merge only; the input file is assumed to be already sorted.
- -o output
- Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input files.
- -u
- Unique: suppress all but one in each set of lines having equal keys. If used with the -c option, check that there are no lines with duplicate keys, in addition to checking that the input file is sorted.
- -z recsz
- The size of the longest line read in the sort phase is recorded so that buffers of the correct size can be allocated during the merge phase. If the sort phase is omitted via the -c or -m options, a system-dependent default size will be used. Lines longer than the buffer size will cause sort to terminate abnormally. Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination.
The following options override the default ordering rules. When ordering options appear independent of any key field specifications, the requested field ordering rules are applied globally to all sort keys. When attached to a specific key (see -k), the specified ordering options override all global ordering options for that key. In the obsolescent forms, if one or more of these options follows a +pos1 option, it will affect only the key field specified by that preceding option.
- -d
- Specify that only blank characters and alphanumeric characters, according to the current setting of LC_CTYPE, are significant in comparisons. The behaviour is undefined for a sort key to which -i or -n also applies.
- -f
- Consider all lower-case characters that have upper-case equivalents, according to the current setting of LC_CTYPE, to be the upper-case equivalent for the purposes of comparison.
- -i
- Ignore all characters that are non-printable, according to the current setting of LC_CTYPE.
- -n
- Restrict the sort key to an initial numeric string, consisting of optional blank characters, optional minus sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which will be sorted by arithmetic value. An empty digit string is treated as zero. Leading zeros and signs on zeros do not affect ordering.
- -r
- Reverse the sense of comparisons.
The treatment of field separators can be altered using the options:
- -b
- Ignore leading blank characters when determining the starting and ending positions of a restricted sort key. If the -b option is specified before the first -k option, it is applied to all -k options. Otherwise, the -b option can be attached independently to each -k field_start or field_end option-argument (see below).
- -t char
- Use char as the field separator character; char is not considered to be part of a field (although it can be included in a sort key). Each occurrence of char is significant (for example, <char><char> delimits an empty field). If -t is not specified, blank characters are used as default field separators; each maximal non-empty sequence of blank characters that follows a non-blank character is a field separator.
Sort keys can be specified using the options:
- -k keydef
- The keydef argument is a restricted sort key field definition. The format of this definition is:
where field_start and field_end define a key field restricted to a portion of the line (see the EXTENDED DESCRIPTION section), and type is a modifier from the list of characters b, d, f, i, n, r. The b modifier behaves like the -b option, but applies only to the field_start or field_end to which it is attached. The other modifiers behave like the corresponding options, but apply only to the key field to which they are attached; they have this effect if specified with field_start, field_end or both. If any modifier is attached to a field_start or to a field_end, no option applies to either. Implementations support at least nine occurrences of the -k option, which are significant in command line order. If no -k option is specified, a default sort key of the entire line is used. When there are multiple key fields, later keys are compared only after all earlier keys compare equal. Except when the -u option is specified, lines that otherwise compare equal are ordered as if none of the options -d, -f, -i, -n or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison. The order in which lines that still compare equal are written is unspecified.field_start[type][,field_end[type]]
- +pos1
- Specify the start position of a key field. See the EXTENDED DESCRIPTION section.
- -pos2
- Specify the end position of a key field. See the EXTENDED DESCRIPTION section.
The following operand is supported:
- file
- A pathname of a file to be sorted, merged or checked. If no file operands are specified, or if a file operand is "-", the standard input will be used.
The standard input will be used only if no file operands are specified, or if a file operand is "-". See the INPUT FILES section.
The input files must be text files, except that the sort utility will add a newline character to the end of a file ending with an incomplete last line.
The following environment variables affect the execution of sort:
- LANG
- Provide a default value for the internationalisation variables that are unset or null. If LANG is unset or null, the corresponding value from the implementation-dependent default locale will be used. If any of the internationalisation variables contains an invalid setting, the utility will behave as if none of the variables had been defined.
- LC_ALL
- If set to a non-empty string value, override the values of all the other internationalisation variables.
- LC_COLLATE
- Determine the locale for ordering rules.
- LC_CTYPE
- Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- versus multi-byte characters in arguments and input files) and the behaviour of character classification for the -b, -d, -f, -i and -n options.
- LC_MESSAGES
- Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.
- LC_NUMERIC
- Determine the locale for the definition of the radix character and thousands separator for the -n option.
- NLSPATH
- Determine the location of message catalogues for the processing of LC_MESSAGES .
Default.
Unless the -o or -c options are in effect, the standard output contains the sorted input.
Used for diagnostic messages. A warning message about correcting an incomplete last line of an input file may be generated, but need not affect the final exit status.
If the -o option is in effect, the sorted input is placed in the file output.
The notation:defines a key field that begins at field_start and ends at field_end inclusive, unless field_start falls beyond the end of the line or after field_end, in which case the key field is empty. A missing field_end means the last character of the line.-k field_start[type][,field_end[type]]
A field comprises a maximal sequence of non-separating characters and, in the absence of option -t, any preceding field separator.
The field_start portion of the keydef option-argument has the form:
field_number[.first_character]
Fields and characters within fields are numbered starting with 1. The field_number and first_character pieces, interpreted as positive decimal integers, specify the first character to be used as part of a sort key. If .first_character is omitted, it refers to the first character of the field.
The field_end portion of the keydef option-argument has the form:
field_number[.last_character]
The field_number is as described above for field_start. The last_character piece, interpreted as a non-negative decimal integer, specifies the last character to be used as part of the sort key. If last_character evaluates to zero or .last_character is omitted, it refers to the last character of the field specified by field_number.
If the -b option or b type modifier is in effect, characters within a field are counted from the first non-blank character in the field. (This applies separately to first_character and last_character.)
The obsolescent options:
provide functionality equivalent to the -k keydef option. For comparison, the full formats of these options are:[+pos1[-pos2]]
+field0_number[.first0_character][type] [-field0_number[.first0_character][type]] -k field_number[.first_character][type] [,field_number[.last_character][type]]
In the obsolescent form, fields (specified by field0_number) and characters within fields (specified by first0_character) are numbered from zero instead of one. The optional type modifiers are the same in both forms. If .first0_character is omitted or first0_character evaluates to zero, it refers to the first character of the field. The -b option does not apply to -pos2.
The fully specified +pos1 -pos2 form with type modifiers T and U:
+w.xT -y.zU
is equivalent to:
undefined (z==0 & U contains b & -t is present) -k w+1.x+1T,y.0U (z==0 otherwise) -k w+1.x+1T,y+1.zU (z > 0) As with the non-obsolescent forms, implementations support at least nine occurrences of the +pos1 option, which are significant in command line order.
The following exit values are returned:
- 0
- All input files were output successfully, or -c was specified and the input file was correctly sorted.
- 1
- Under the -c option, the file was not ordered as specified, or if the -c and -u options were both specified, two input lines were found with equal keys.
- >1
- An error occurred.
Default.
The default value for -t, blank character, has different properties from, for example, -t "<space>". If a line contains:the following treatment would occur with default separation as opposed to specifically selecting a space character:<space><space>foo
Field Default -t "<space>" 1 <space><space>foo empty 2 empty empty 3 empty foo The leading field separator itself is included in a field when -t is not used. For example, this command returns an exit status of zero, meaning the input was already sorted:
(assuming that a tab character precedes the space character in the current collating sequence). The field separator is not included in a field when it is explicitly set via -t. This is historical practice and allows usage such as:sort -c -k 2 <<eof y<tab>b x<space>a eof
where the second field can be correctly sorted numerically without regard to the non-numeric field separator.sort -t "|" -k 2n <<eof Atlanta|425022|Georgia Birmingham|284413|Alabama Columbia|100385|South Carolina eof
The wording in the OPTIONS section clarifies that the -b, -d, -f, -i, -n and -r options have to come before the first sort key specified if they are intended to apply to all specified keys. The way it is described in this document matches historical practice, not historical documentation. In the non-obsolescent versions, the results are unspecified if these options are specified after a -k option.
The -f option might not work as expected in locales where there is not a one-to-one mapping between an upper- and a lower-case letter.
In the following examples, non-obsolescent and obsolescent ways of specifying sort keys are given as an aid to understanding the relationship between the two forms.
- Either of the following commands sorts the contents of infile with the second field as the sort key:
sort -k 2,2 infile sort +1 -2 infile.sE
- Either of the following commands sorts, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator):
sort -r -o outfile -k 2.2,2.2 infile1 infile2 sort -r -o outfile +1.1 -1.2 infile1 infile2.sE
- Either of the following commands sorts the contents of infile1 and infile2 using the second non-blank character of the second field as the sort key:
sort -k 2.2b,2.2b infile1 infile2 sort +1.1b -1.2b infile1 infile2.sE
- Either of the following commands prints the System V password file (user database) sorted by the numeric user ID (the third colon-separated field):
sort -t : -k 3,3n /etc/passwd sort -t : +2 -3n /etc/passwd.sE
- Either of the following commands prints the lines of the already sorted file infile, suppressing all but one occurrence of lines having the same third field:
sort -um -k 3.1,3.0 infile sort -um +2.0 -3.0 infile.sE
None.
comm, join, uniq, the XSH specification description of toupper().