csplit - split files based on context
csplit [-ks][-f prefix][-n number] file arg1 ...argn
The csplit utility reads the file named by the file operand, writes all or part of that file into other files as directed by the arg operands, and writes the sizes of the files.
The csplit utility supports the XBD specification, Utility Syntax Guidelines .The following options are supported:
- -f prefix
- Name the created files prefix00, prefix01, ..., prefixn. The default is xx00 ... xxn. If the prefix argument would create a filename exceeding {NAME_MAX} bytes, an error will result, csplit will exit with a diagnostic message and no files will be created.
- -k
- Leave previously created files intact. By default, csplit will remove created files if an error occurs.
- -n number
- Use number decimal digits to form filenames for the file pieces. The default is 2.
- -s
- Suppress the output of file size messages.
The following operands are supported:
- file
- The pathname of a text file to be split. If file is "-", the standard input will be used.
The operands arg1 ... argn can be a combination of the following:
- /rexp/[offset]
- Create a file using the content of the lines from the current line up to, but not including, the line that results from the evaluation of the regular expression with offset, if any, applied. The regular expression rexp must follow the rules for basic regular expressions described in the XBD specification, Basic Regular Expressions . The optional offset must be a positive or negative integer value representing a number of lines. The integer value must be preceded by "+" or "-". If the selection of lines from an offset expression of this type would create a file with zero lines, or one with greater than the number of lines left in the input file, the results are unspecified. After the section is created, the current line will be set to the line that results from the evaluation of the regular expression with any offset applied. The pattern match of rexp always is applied from the current line to the end of the file.
- %rexp%[offset]
- This operand is the same as /rexp/[offset], except that no file will be created for the selected section of the input file.
- line_no
- Create a file from the current line up to (but not including) the line number line_no. Lines in the file will be numbered starting at one. The current line becomes line_no.
- {num}
- Repeat operand. This operand can follow any of the operands described previously. If it follows a rexp type operand, that operand will be applied num more times. If it follows a line_no operand, the file will be split every line_no lines, num times, from that point.
An error will be reported if an operand does not reference a line between the current position and the end of the file.
See the INPUT FILES section.
The input file must be a text file.
The following environment variables affect the execution of csplit:
- LANG
- Provide a default value for the internationalisation variables that are unset or null. If LANG is unset or null, the corresponding value from the implementation-dependent default locale will be used. If any of the internationalisation variables contains an invalid setting, the utility will behave as if none of the variables had been defined.
- LC_ALL
- If set to a non-empty string value, override the values of all the other internationalisation variables.
- LC_COLLATE
- Determine the locale for the behaviour of ranges, equivalence classes and multi-character collating elements within regular expressions.
- LC_CTYPE
- Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files) and the behaviour of character classes within regular expressions.
- LC_MESSAGES
- Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.
- NLSPATH
- Determine the location of message catalogues for the processing of LC_MESSAGES .
If the -k option is specified, created files will be retained. Otherwise the default action occurs.
Unless the -s option is used, the standard output will consist of one line per file created, with a format as follows:
"%d\n", <file size in bytes>
Used only for diagnostic messages.
The output files will contain portions of the original input file, otherwise unchanged.
None.
The following exit values are returned:
- 0
- Successful completion.
- >0
- An error occurred.
By default, created files will be removed if an error occurs. When the -k option is specified, created files will not be removed if an error occurs.
None.
- This example creates four files, cobol00 ... cobol03:
csplit -f cobol file '/procedure division/' /par5./ /par16./
After editing the split files, they can be recombined as follows:
cat cobol0[0-3] > file
Note that this example overwrites the original file.
- This example would split the file after the first 99 lines, and every 100 lines thereafter, up to 9999 lines; this is because lines in the file are numbered from 1 rather than zero, for historical reasons:
csplit -k file 100 {99}
- Assuming that prog.c follows the C-language coding convention of ending routines with a "}" at the beginning of the line, this example will create a file containing each separate C routine (up to 21) in prog.c:
csplit -k prog.c '%main(%' '/^}/+1' {20}
The IEEE PASC 1003.2 Interpretations Committee has forwarded concerns about parts of this interface definition to the IEEE PASC Shell and Utilities Working Group which is identifying the corrections. A future revision of this specification will align with IEEE Std. 1003.2b when finalised.
sed, split.