sed - stream editor
sed [-n] script[file...] sed [-n][-e script]...[-f script_file]...[file...]
The sed utility is a stream editor that reads one or more text files, makes editing changes according to a script of editing commands, and writes the results to standard output. The script is obtained from either the script operand string or a combination of the option-arguments from the -e script and -f script_file options.
The sed utility supports the XBD specification, Utility Syntax Guidelines , except that the order of presentation of the -e and -f options is significant.The following options are supported:
- -e script
- Add the editing commands specified by the script option-argument to the end of the script of editing commands. The script option-argument has the same properties as the script operand, described in the OPERANDS section.
- -f script_file
- Add the editing commands in the file script_file to the end of the script.
- -n
- Suppress the default output (in which each line, after it is examined for editing, is written to standard output). Only lines explicitly selected for output will be written.
Multiple -e and -f options may be specified. All commands are added to the script in the order specified, regardless of their origin.
The following operands are supported:
- file
- A pathname of a file whose contents will be read and edited. If multiple file operands are specified, the named files will be read in the order specified and the concatenation will be edited. If no file operands are specified, the standard input will be used.
- script
- A string to be used as the script of editing commands. The application must not present a script that violates the restrictions of a text file except that the final character need not be a newline character.
The standard input will be used only if no file operands are specified. See the INPUT FILES section.
The input files must be text files. The script_files named by the -f option will consist of editing commands, one per line.
The following environment variables affect the execution of sed:
- LANG
- Provide a default value for the internationalisation variables that are unset or null. If LANG is unset or null, the corresponding value from the implementation-dependent default locale will be used. If any of the internationalisation variables contains an invalid setting, the utility will behave as if none of the variables had been defined.
- LC_ALL
- If set to a non-empty string value, override the values of all the other internationalisation variables.
- LC_COLLATE
- Determine the locale for the behaviour of ranges, equivalence classes and multi-character collating elements within regular expressions.
- LC_CTYPE
- Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- versus multi-byte characters in arguments and input files), and the behaviour of character classes within regular expressions.
- LC_MESSAGES
- Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.
- NLSPATH
- Determine the location of message catalogues for the processing of LC_MESSAGES .
Default.
The input files are written to standard output, with the editing commands specified in the script applied. If the -n option is specified, only those input lines selected by the script will be written to standard output.
Used only for diagnostic messages.
The output files are text files whose formats are dependent on the editing commands given.
The script consists of editing commands, one per line, of the following form:
- [address[,address]]command[arguments]
Zero or more blank characters are accepted before the first address and before command.
In default operation, sed cyclically copies a line of input, less its terminating newline character, into a pattern space (unless there is something left after a D command), applies in sequence all commands whose addresses select that pattern space, and at the end of the script copies the pattern space to standard output (except when -n is specified) and deletes the pattern space. Whenever the pattern space is written to standard output or a named file, sed will immediately follow it with a newline character.
Some of the commands use a hold space to save all or part of the pattern space for subsequent retrieval. The pattern and hold spaces will each be able to hold at least 8192 bytes.
Addresses in sed
An address is either empty, a decimal number that counts input lines cumulatively across files, a "$" character that addresses the last line of input, or a context address (which consists of a regular expression as described inRegular Expressions in sed , preceded and followed by a delimiter, usually a slash).A command line with no addresses selects every pattern space.
A command line with one address selects each pattern space that matches the address.
A command line with two addresses selects the inclusive range from the first pattern space that matches the first address to the next pattern space that matches the second. (If the second address is a number less than or equal to the line number first selected, only one line will be selected.) Starting at the first line following the selected range, sed looks again for the first address. Thereafter the process is repeated.
Editing commands can be applied only to non-selected pattern spaces by use of the negation command "!" (see
Editing Commands in sed ).Regular Expressions in sed
The sed utility supports the basic regular expressions described in the XBD specification, Basic Regular Expressions , with the following additions:
- In a context address, the construction \cREc where c is any character other than a backslash or newline character, is identical to /RE/ If the character designated by c appears following a backslash, then it is considered to be that literal character, which does not terminate the RE. For example, in the context address \xabc\xdefx, the second x stands for itself, so that the regular expression is abcxdef.
- The escape sequence \n matches a newline character embedded in the pattern space. A literal newline character must not be used in the regular expression of a context address or in the substitute command.
Editing Commands in sed
In the following list of commands, the maximum number of permissible addresses for each command is indicated by [0addr], [1addr] or [2addr], representing zero, one or two addresses. The argument text consists of one or more lines. Each embedded newline character in the text must be preceded by a backslash. Other backslashes in text are removed and the following character is treated literally.
The r and w commands take an optional rfile (or wfile) parameter, separated from the command letter by one or more blank characters; implementations may allow zero separation as an extension.
The argument rfile or the argument wfile terminates the command line. Each wfile will be created before processing begins. Implementations support at least ten wfile arguments in the script; the actual number (greater than or equal to 10) that will be supported by the implementation is unspecified. The use of the wfile parameter causes that file to be initially created, if it does not exist, or will replace the contents of an existing file.
The b, r, s, t, w, y, ! and : commands accept additional arguments. The following synopses indicate which arguments must be separated from the commands by a single space character.
Two of the commands take a command-list, which is a list of sed commands separated by newline characters, as follows:
{ command command . . . }
The "{" can be preceded with blank characters and can be followed with white space. The commands can be preceded by white space. The terminating "}" must be preceded by a newline character and then zero or more blank characters.
- [2addr] {command-list
- }
- Execute command-list only when the pattern space is selected.
- [1addr]a\
- text
- Write text to standard output just before each attempt to fetch a line of input, whether by executing the N command or by beginning a new cycle.
- [2addr]b [label]
- Branch to the : command bearing the label. If label is not specified, branch to the end of the script. The implementation supports labels recognised as unique up to at least 8 characters; the actual length (greater than or equal to 8) that is supported by the implementation is unspecified. It is unspecified whether exceeding a label length causes an error or a silent truncation.
- [2addr]c\
- text
- Delete the pattern space. With a 0 or 1 address or at the end of a 2-address range, place text on the output.
- [2addr]d
- Delete the pattern space and start the next cycle.
- [2addr]D
- Delete the initial segment of the pattern space up to and including the first newline character and start the next cycle.
- [2addr]g
- Replace the contents of the pattern space by the contents of the hold space.
- [2addr]G
- Append to the pattern space a newline character followed by the contents of the hold space.
- [2addr]h
- Replace the contents of the hold space with the contents of the pattern space.
- [2addr]H
- Append to the hold space a newline character followed by the contents of the pattern space.
- [1addr]i\
- text
- Write text to standard output.
- [2addr]l
- (The letter ell.) Write the pattern space to standard output in a visually unambiguous form. The characters listed in the table in the XBD specification, File Format Notation (\\, \a, \b, \f, \r, \t, \v) will be written as the corresponding escape sequence; the \n in that table is not applicable. Non-printable characters not in that table will be written as one three-digit octal number (with a preceding backslash character) for each byte in the character (most significant byte first). If the size of a byte on the system is greater than nine bits, the format used for non-printable characters is implementation-dependent. Long lines will be folded, with the point of folding indicated by writing a backslash followed by a newline character; the length at which folding occurs is unspecified, but should be appropriate for the output device. The end of each line will be marked with a "$".
- [2addr]n
- Write the pattern space to standard output if the default output has not been suppressed, and replace the pattern space with the next line of input.
- [2addr]N
- Append the next line of input to the pattern space, using an embedded newline character to separate the appended material from the original material. Note that the current line number changes.
- [2addr]p
- Write the pattern space to standard output.
- [2addr]P
- Write the pattern space, up to the first newline character, to standard output.
- [1addr]q
- Branch to the end of the script and quit without starting a new cycle.
- [1addr]r rfile
- Copy the contents of rfile to standard output just before each attempt to fetch a line of input. If rfile does not exist or cannot be read, it is treated as if it were an empty file, causing no error condition.
- [2addr]s/regular expression/replacement/flags
- Substitute the replacement string for instances of the regular expression in the pattern space. Any character other than backslash or newline can be used instead of a slash to delimit the RE and the replacement. Within the RE and the replacement, the RE delimiter itself can be used as a literal character if it is preceded by a backslash. An ampersand (&) appearing in the replacement will be replaced by the string matching the RE. The special meaning of "&" in this context can be suppressed by preceding it by backslash. The characters \n, where n is a digit, will be replaced by the text matched by the corresponding backreference expression. For each backslash (\) encountered in scanning replacement from beginning to end, the following character loses its special meaning (if any). It is unspecified what special meaning is given to any character other than &, \ or digits. A line can be split by substituting a newline character into it. The application must escape the newline character in the replacement by preceding it by backslash. A substitution is considered to have been performed even if the replacement string is identical to the string that it replaces. The value of flags must be zero or more of:
- n
- Substitute for the nth occurrence only of the regular expression found within the pattern space.
- g
- Globally substitute for all non-overlapping instances of the regular expression rather than just the first one. If both g and n are specified, the results are unspecified.
- p
- Write the pattern space to standard output if a replacement was made.
- w wfile
- Write. Append the pattern space to wfile if a replacement was made.
- [2addr]t [label]
- Test. Branch to the : command bearing the label if any substitutions have been made since the most recent reading of an input line or execution of a t. If label is not specified, branch to the end of the script.
- [2addr]w wfile
- Append (write) the pattern space to wfile.
- [2addr]x
- Exchange the contents of the pattern and hold spaces.
- [2addr]y/string1/string2/
- Replace all occurrences of characters in string1 with the corresponding characters in string2. If the number of characters in string1 and string2 are not equal, or if any of the characters in string1 appear more than once, the results are undefined. Any character other than backslash or newline can be used instead of slash to delimit the strings. Within string1 and string2, the delimiter itself can be used as a literal character if it is preceded by a backslash.
- [2addr]!command
- [2addr]!{command-list
- }
- Apply the command or command-list only to the lines that are not selected by the addresses.
- [0addr]:label
- This command does nothing; it bears a label for the b and t commands to branch to.
- [1addr]=
- Write the following to standard output:
"%d\n", <current line number>
- [0addr]
- An empty command is ignored.
- [0addr]#
- The "#" and the remainder of the line are ignored (treated as a comment), with the single exception that if the first two characters in the file are #n, the default output is suppressed; this is the equivalent of specifying -n on the command line.
The following exit values are returned:
- 0
- Successful completion.
- >0
- An error occurred.
Default.
Regular expressions match entire strings, not just individual lines, but a newline character is matched by \n in a sed RE; a newline character is not allowed in an RE. Also note that \n cannot be used to match a newline character at the end of an arbitrary input line; newline characters appear in the pattern space as a result of the N editing command.
This sed script simulates the BSD cat -s command, squeezing excess blank lines from standard input.sed -n ' # Write non-empty lines. /./ { p d } # Write a single empty line, then look for more empty lines. /^$/ p # Get next line, discard the held <newline> (empty line), # and look for more empty lines. :Empty /^$/ { N s/.// b Empty } # Write the non-empty line before going back to search # for the first in a set of empty lines. p '
The IEEE PASC 1003.2 Interpretations Committee has forwarded concerns about parts of this interface definition to the IEEE PASC Shell and Utilities Working Group which is identifying the corrections. A future revision of this specification will align with IEEE Std. 1003.2b when finalised.
awk, ed, grep.