The Open Group Base Specifications Issue 8
IEEE Std 1003.1-2024
Copyright © 2001-2024 The IEEE and The Open Group

NAME

m4 — macro processor

SYNOPSIS

m4 [-s] [-D name[=val]]... [-U name]... [file...]

DESCRIPTION

The m4 utility is a macro processor that shall read one or more text files, process them according to their included macro statements, and write the results to standard output.

OPTIONS

The m4 utility shall conform to XBD 12.2 Utility Syntax Guidelines , except that the order of the -D and -U options shall be significant, and options can be interspersed with operands.

The following options shall be supported:

-s
Enable line synchronization output for the c17 preprocessor phase (that is, #line directives).
-D name[=val]

Define name to val or to null if =val is omitted.
-U name
Undefine name.

OPERANDS

The following operand shall be supported:

file
A pathname of a text file to be processed. If no file is given, or if it is '-', the standard input shall be read.

STDIN

The standard input shall be a text file that is used if no file operand is given, or if it is '-'.

INPUT FILES

The input file named by the file operand shall be a text file.

ENVIRONMENT VARIABLES

The following environment variables shall affect the execution of m4:

LANG
Provide a default value for the internationalization variables that are unset or null. (See XBD 8.2 Internationalization Variables for the precedence of internationalization variables used to determine the values of locale categories.)
LC_ALL
If set to a non-empty string value, override the values of all the other internationalization variables.
LC_CTYPE
Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and input files).
LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.
NLSPATH
[XSI] [Option Start] Determine the location of messages objects and message catalogs. [Option End]

ASYNCHRONOUS EVENTS

Default.

STDOUT

The standard output shall be the same as the input files, after being processed for macro expansion.

STDERR

The standard error shall be used to display strings with the errprint macro, macro tracing enabled by the traceon macro, the defined text for macros written by the dumpdef macro, or for diagnostic messages.

OUTPUT FILES

None.

EXTENDED DESCRIPTION

The m4 utility shall compare each token from the input against the set of built-in and user-defined macros. If the token matches the name of a macro, then the token shall be replaced by the macro's defining text, if any, then scanning for tokens shall resume at the start of the macro's defining text concatenated with the subsequent input. If a token does not match the name of a macro, it shall be written to standard output. Macros may have arguments, in which case the arguments shall be substituted into the defining text before it is rescanned.

No special meaning shall be given to characters enclosed between matching left and right quoting strings, other than identifying nested quoting while finding the matching right quoting string, but the outermost quoting strings shall themselves be discarded. By default, the left quoting string consists of a grave accent (backquote) and the right quoting string consists of an acute accent (single-quote); see also the changequote macro.

Comments are written but not scanned for matching macro names; by default, the begin-comment string consists of the <number-sign> character and the end-comment string consists of a <newline>. See also the changecom and dnl macros.

Name tokens shall consist of the longest possible sequence of letters, digits, and underscores, where the first character is not a digit. Tokens not of this form shall not be treated as name tokens. A macro call is a name token that matches the name of a built-in or user-defined macro. Macro calls can have either of the following forms, which shall be distinguished by whether or not the macro name is immediately followed by a <left-parenthesis>:

name

name(arg1, arg2, ..., argn)

The application shall ensure that the <left-parenthesis> immediately follows the name of the macro. If a token matching the name of a macro is not followed by a <left-parenthesis>, it shall be handled as a use of that macro without arguments.

If a macro name is followed by a <left-parenthesis>, the subsequent text shall be tokenized and expanded until a token is encountered that is not a quoted string and whose expansion includes a matching unquoted <right-parenthesis>. The expanded text between the <left-parenthesis> and the matching unquoted <right-parenthesis> is the macro's argument text. An unquoted <comma> character within the macro's argument text shall mark the end of one argument and the beginning of the next argument unless the unquoted <comma> is enclosed within a nested unquoted <left-parenthesis>, <right-parenthesis> pair. The unquoted <comma> characters that separate the arguments, and any unquoted white-space characters at the beginning of each argument, shall be discarded. All other characters in the macro's argument text, including any white-space characters at the end of an argument and any nested parenthesized text, shall be retained. The input text containing the macro name, the following <left-parenthesis>, and all tokens up to and including the token whose expansion contained the matching unquoted <right-parenthesis> shall be replaced, and tokenization shall resume on the result of performing argument substition on the macro's defining text followed by any expanded text that followed the matching unquoted <right-parenthesis>. Otherwise, the macro name was not followed by a <left-parenthesis>, and tokenization shall resume on the result of performing argument substitution with zero arguments on the macro's defining text.

During argument substitution, arguments shall be positionally defined and referenced. The string "$1" in the defining text shall be replaced by the first argument. Systems shall support at least nine arguments; only the first nine can be referenced, using the strings "$1" to "$9", inclusive. The string "$0" shall be replaced with the name of the macro. The string "$#" shall be replaced by the number of arguments as a minimal string of decimal digits ('0' if the macro was invoked without being followed by a <left-parenthesis>, otherwise 1 more than the number of unquoted <comma> characters that divided arguments in the macro's argument text). The string "$*" shall be replaced by a list of all of the arguments, separated by <comma> characters. The string "$@" shall be replaced by a list of all of the arguments separated by <comma> characters, and each argument shall be quoted using the current left and right quoting strings. The string "${" produces unspecified behavior.

If fewer arguments are supplied than are in the macro definition, the omitted arguments are taken to be null. It is not an error if more arguments are supplied than are in the macro definition.

The m4 utility shall make available the following built-in macros. They can be redefined, but once this is done the original meaning is lost. Their values shall be null unless otherwise stated. In the descriptions below, the term defining text refers to the value of the macro: the second argument to the define macro, among other things. Except for the first argument to the eval macro, all numeric arguments to built-in macros shall be interpreted as decimal values. The string values produced as the defining text of the decr, divnum, incr, index, len, and sysval built-in macros shall be in the form of a decimal-constant as defined in the C language.

changecom
The changecom macro shall set the begin-comment and end-comment strings. With no arguments, the comment mechanism shall be disabled. With a single non-null argument, that argument shall become the begin-comment and the <newline> shall become the end-comment string. With two non-null arguments, the first argument shall become the begin-comment string and the second argument shall become the end-comment string. The behavior is unspecified if either argument is provided but null, or if either argument includes letters, digits, underscore, or <left-parenthesis>. Systems shall support comment strings of at least five characters.
changequote
The changequote macro shall set the begin-quote and end-quote strings. With no arguments, the quote strings shall be set to the default values (that is, `'). The behavior is unspecified if there is a single argument, or if either argument is null or includes letters, digits, underscore, or <left-parenthesis>. With two non-null arguments, the first argument shall become the begin-quote string and the second argument shall become the end-quote string. Systems shall support quote strings of at least five characters.
decr
The defining text of the decr macro shall be its first argument decremented by 1. It shall be an error to specify an argument containing any non-numeric characters. The behavior is unspecified if decr is not immediately followed by a <left-parenthesis>.
define
The second argument shall become the defining text of the macro whose name is the first argument. It is unspecified whether the define macro deletes all prior definitions of the macro named by its first argument or preserves all but the current definition of the macro. The behavior is unspecified if define is not immediately followed by a <left-parenthesis>.
defn
The defining text of the defn macro shall be the quoted definition (using the current quoting strings) of its arguments. The behavior is unspecified if defn is not immediately followed by a <left-parenthesis>.
divert
The m4 utility maintains nine temporary buffers, numbered 1 to 9, inclusive. When the last of the input has been processed, any output that has been placed in these buffers shall be written to standard output in buffer-numerical order. The divert macro shall divert future output to the buffer specified by its argument. Specifying no argument or an argument of 0 shall resume the normal output process. Output diverted to a stream with a negative number shall be discarded. Behavior is implementation-defined if a stream number larger than 9 is specified. It shall be an error to specify an argument containing any non-numeric characters.
divnum
The defining text of the divnum macro shall be the number of the current output stream as a string.
dnl
The dnl macro shall cause m4 to discard all input characters up to and including the next <newline>.
dumpdef
The dumpdef macro shall write the defined text to standard error for each of the macros specified as arguments, or, if no arguments are specified, for all macros.
errprint
The errprint macro shall write its arguments to standard error. The behavior is unspecified if errprint is not immediately followed by a <left-parenthesis>.
eval
The eval macro shall evaluate its first argument as an arithmetic expression, using signed integer arithmetic with at least 32-bit precision. At least the following C-language operators shall be supported, with precedence, associativity, and behavior as described in 1.1.2.1 Arithmetic Precision and Operations :
()
unary +
unary -
~

! binary * / % binary + binary - << >> < <= > >= == != binary & ^ | && ||

Systems shall support octal and hexadecimal numbers as in the ISO C standard. The second argument, if specified, shall set the radix for the result; if the argument is blank or unspecified, the default is 10. Behavior is unspecified if the radix falls outside the range 2 to 36, inclusive. The third argument, if specified, sets the minimum number of digits in the result. Behavior is unspecified if the third argument is less than zero. It shall be an error to specify the second or third argument containing any non-numeric characters. The behavior is unspecified if eval is not immediately followed by a <left-parenthesis>.

ifdef
If the first argument to the ifdef macro is defined, the defining text shall be the second argument. Otherwise, the defining text shall be the third argument, if specified, or the null string, if not. The behavior is unspecified if ifdef is not immediately followed by a <left-parenthesis>.
ifelse
The ifelse macro takes three or more arguments. If the first two arguments compare as equal strings, the defining text shall be the third argument. If the first two arguments do not compare as equal strings and there are three arguments, the defining text shall be null. If the first two arguments do not compare as equal strings and there are four or five arguments, the defining text shall be the fourth argument. If the first two arguments do not compare as equal strings and there are six or more arguments, the first three arguments shall be discarded and processing shall restart with the remaining arguments. The behavior is unspecified if ifelse is not immediately followed by a <left-parenthesis>.
include
The defining text for the include macro shall be the contents of the file named by the first argument. It shall be an error if the file cannot be read. The behavior is unspecified if include is not immediately followed by a <left-parenthesis>.
incr
The defining text of the incr macro shall be its first argument incremented by 1. It shall be an error to specify an argument containing any non-numeric characters. The behavior is unspecified if incr is not immediately followed by a <left-parenthesis>.
index
The defining text of the index macro shall be the first character position (as a string) in the first argument where a string matching the second argument begins (zero origin), or -1 if the second argument does not occur. The behavior is unspecified if index is not immediately followed by a <left-parenthesis>.
len
The defining text of the len macro shall be the length (as a string) of the first argument. The behavior is unspecified if len is not immediately followed by a <left-parenthesis>.
m4exit
Exit from the m4 utility. If the first argument is specified, it shall be the exit code. If no argument is specified, the exit code shall be zero. It shall be an error to specify an argument containing any non-numeric characters. If the first argument is zero or no argument is specified, and an error has previously occurred (for example, a file operand that could not be opened), the exit status shall be non-zero.
m4wrap
The first argument shall be processed when EOF is reached. If the m4wrap macro is used multiple times, the arguments specified shall be processed in the order in which the m4wrap macros were processed. The behavior is unspecified if m4wrap is not immediately followed by a <left-parenthesis>.
mkstemp
The defining text shall be as if it were the resulting pathname after a successful call to the mkstemp() function defined in the System Interfaces volume of POSIX.1-2024 called with the first argument to the macro invocation. If a file is created, that file shall be closed. If a file could not be created, the m4 utility shall write a diagnostic message to standard error and shall continue processing input but its final exit status shall be non-zero; the defining text of the macro shall be the empty string. The behavior is unspecified if mkstemp is not immediately followed by a <left-parenthesis>.
popdef
The popdef macro shall delete the current definition of its arguments, replacing that definition with the previous one. If there is no previous definition, the macro is undefined. The behavior is unspecified if popdef is not immediately followed by a <left-parenthesis>.
pushdef
The pushdef macro shall be equivalent to the define macro with the exception that it shall preserve any current definition for future retrieval using the popdef macro. The behavior is unspecified if pushdef is not immediately followed by a <left-parenthesis>.
shift
The defining text for the shift macro shall be a comma-separated list of its arguments except the first one. Each argument shall be quoted using the current quoting strings. The behavior is unspecified if shift is not immediately followed by a <left-parenthesis>.
sinclude
The sinclude macro shall be equivalent to the include macro, except that it shall not be an error if the file is inaccessible. The behavior is unspecified if sinclude is not immediately followed by a <left-parenthesis>.
substr
The defining text for the substr macro shall be the substring of the first argument beginning at the zero-offset character position specified by the second argument. The third argument, if specified, shall be the number of characters to select; if not specified, the characters from the starting point to the end of the first argument shall become the defining text. It shall not be an error to specify a starting point beyond the end of the first argument and the defining text shall be null. It shall be an error to specify an argument containing any non-numeric characters. The behavior is unspecified if substr is not immediately followed by a <left-parenthesis>.
syscmd
The syscmd macro shall interpret its first argument as a shell command line. The defining text shall be the string result of that command. The string result shall not be rescanned for macros while setting the defining text. No output redirection shall be performed by the m4 utility. The exit status value from the command can be retrieved using the sysval macro. The behavior is unspecified if syscmd is not immediately followed by a <left-parenthesis>.
sysval
The defining text of the sysval macro shall be the exit value of the utility last invoked by the syscmd macro (as a string).
traceon
The traceon macro shall enable tracing for the macros specified as arguments, or, if no arguments are specified, for all macros. The trace output shall be written to standard error in an unspecified format.
traceoff
The traceoff macro shall disable tracing for the macros specified as arguments, or, if no arguments are specified, for all macros.
translit
The defining text of the translit macro shall be the first argument with every character that occurs in the second argument replaced with the corresponding character from the third argument. If no replacement character is specified for some source character because the second argument is longer than the third argument, that character shall be deleted from the first argument in translit's defining text. The behavior is unspecified if the '-' character appears within the second or third argument anywhere besides the first or last character. The behavior is unspecified if the same character appears more than once in the second argument. The behavior is unspecified if translit is not immediately followed by a <left-parenthesis>.
undefine
The undefine macro shall delete all definitions (including those preserved using the pushdef macro) of the macros named by its arguments. The behavior is unspecified if undefine is not immediately followed by a <left-parenthesis>.
undivert
The undivert macro shall cause immediate output of any text in temporary buffers named as arguments, or all temporary buffers if no arguments are specified. Buffers can be undiverted into other temporary buffers. Undiverting shall discard the contents of the temporary buffer. The behavior is unspecified if an argument contains any non-numeric characters.

EXIT STATUS

The following exit values shall be returned:

 0
Successful completion.
>0
An error occurred

If the m4exit macro is used, the exit value can be specified by the input file.

CONSEQUENCES OF ERRORS

Default.


The following sections are informative.

APPLICATION USAGE

The defn macro is useful for renaming macros, especially built-ins.

Since eval defers to the ISO C standard, some operations have undefined behavior. In some implementations, division or remainder by zero cause a fatal signal, even if the division occurs on the short-circuited branch of "&&" or "||". Any operation that overflows in signed arithmetic produces undefined behavior. Likewise, using the shift operators with a shift amount that is not positive and smaller than the precision is undefined, as is shifting a negative number to the right. Historically, not all implementations obeyed C-language precedence rules: '~' and '!' were lower than '=='; '==' and '!=' were not lower than '<'; and '|' was not lower than '^'; the liberal use of "()" can force the desired precedence even with these non-compliant implementations. Furthermore, some traditional implementations treated '^' as an exponentiation operator, although most implementations now use "**" as an extension for this purpose.

When a macro has been multiply defined via the pushdef macro, it is unspecified whether the define macro will alter only the most recent definition (as though by popdef and pushdef), or replace the entire stack of definitions with a single definition (as though by undefine and pushdef). An application desiring particular behavior for the define macro in this case can redefine it accordingly.

EXAMPLES

If the file m4src contains the lines:

The value of `VER' is "VER".
ifdef(`VER', ``VER'' is defined to be VER., VER is not defined.)
ifelse(VER, 1, ``VER'' is `VER'.)
ifelse(VER, 2, ``VER'' is `VER'., ``VER'' is not 2.)
end

then the command

m4 m4src

or the command:

m4 -U VER m4src

produces the output:

The value of VER is "VER".
VER is not defined.

VER is not 2. end

The command:

m4 -D VER m4src

produces the output:

The value of VER is "".
VER is defined to be .

VER is not 2. end

The command:

m4 -D VER=1 m4src

produces the output:

The value of VER is "1".
VER is defined to be 1.
VER is 1.
VER is not 2.
end

The command:

m4 -D VER=2 m4src

produces the output:

The value of VER is "2".
VER is defined to be 2.

VER is 2. end

In the following six examples, an additional line is evaluated after this prologue of three definitions:

define(`macro', `argument 2 is :`$2':, called with $# arguments')dnl
define(`argumentsa', `Arguments')dnl
define(`a', `.')dnl
  1. The additional line:

    macro`'a
    
    produces:
    argument 2 is ::, called with 0 arguments.
    
    Explanation: macro is called with 0 arguments (as shown by the $# substitution), the substitution of $2 is the empty string, and the empty quoted string after the expansion text prevents concatenation with the subsequent 'a', which in turn lets macro a expand to the final '.'.
  2. The additional line:

    macro()a
    
    produces:
    argument 2 is ::, called with 1 Arguments
    
    Explanation: macro is called with one (empty string) argument; then the defining text ending in "arguments" is concatenated with the subsequent 'a' to form the next macro name argumentsa which is expanded into Arguments before the final output.
  3. The additional line:

    macro(  1, ( ,2,) ,  `3')
    
    produces:
    argument 2 is :( ,2,) :, called with 3 arguments
    
    Explanation: Leading (but not internal or trailing) space is removed before the argument substituted for $2, and the unquoted commas embedded in parentheses do not delineate arguments.
  4. The additional line:

    macro(  `1', `mac2(,`2',)', `3')
    
    produces:
    argument 2 is :mac2(,`2',):, called with 3 arguments
    
    Explanation: Regardless of whether mac2 is a defined macro, quoting in the macro call prevents interpretation of "mac2" during argument collection, and the quoting in the defining text of macro prevents interpretation of "mac2" in the substitution of $2 during rescan of the output of macro.
  5. The additional line:

    undefine(`mac2')macro(  1, mac2(,2,), 3)
    
    produces:
    argument 2 is :mac2(,2,):, called with 3 arguments
    
    Explanation: mac2 is not a macro name when scanned during argument collection, so it and the subsequent parenthesized text are used literally.
  6. The additional line:

    define(`mac2', `hi $@')macro(  1, mac2( ,2,), 3)
    
    produces:
    argument 2 is :hi :, called with 5 arguments
    
    Explanation: mac2 is a macro name, so collecting the arguments to macro requires scanning the output of mac2(,2,) (the text hi `',`2',`' after substitution of $@); this output contains unquoted commas causing additional arguments to be visible to macro.

RATIONALE

Historic System V-based behavior treated "${" in a macro definition as two literal characters. However, this sequence is left unspecified so that implementations may offer extensions such as "${11}" meaning the eleventh argument. Macros can still be defined with appropriate uses of nested quoting to result in a literal "${" in the output after rescanning removes the nested quotes.

In the translit built-in, historic System V-based behavior treated '-' as a literal; GNU behavior treats it as a range. This version of the standard allows either behavior.

Earlier versions of this standard allowed the exit status to be either zero or non-zero when m4exit(0) is called after an error has occurred. Exiting with zero status is now disallowed as this hides the fact that an error occurred from shell scripts that check the exit status of m4.

FUTURE DIRECTIONS

If this utility is directed to display a pathname that contains any bytes that have the encoded value of a <newline> character when <newline> is a terminator or separator in the output format being used, implementations are encouraged to treat this as an error. A future version of this standard may require implementations to treat this as an error.

SEE ALSO

c17

XBD 8. Environment Variables , 12.2 Utility Syntax Guidelines

CHANGE HISTORY

First released in Issue 2.

Issue 5

The phrase "the defined text for macros written by the dumpdef macro" is added to the description of STDERR, and the description of dumpdef is updated to indicate that output is written to standard error. The description of eval is updated to indicate that the list of excluded C operators excludes unary '&' and '.'. In the description of ifdef, the phrase "and it is not defined to be zero" is deleted.

Issue 6

In the EXTENDED DESCRIPTION, the eval text is updated to include a '&' character in the excepted list.

The EXTENDED DESCRIPTION of divert is updated to clarify that there are only nine diversion buffers.

The normative text is reworded to avoid use of the term "must" for application requirements.

The Open Group Base Resolution bwg2000-006 is applied.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/31 is applied, replacing the EXAMPLES section.

Issue 7

Austin Group Interpretation 1003.1-2001 #117 is applied, marking the maketemp macro obsolescent and adding a new mkstemp macro.

Austin Group Interpretation 1003.1-2001 #207 is applied, clarifying the handling of white-space characters that precede or trail any macro arguments.

SD5-XCU-ERN-6 is applied, clarifying that Guideline 9 of the Utility Syntax Guidelines does not apply (options can be interspersed with operands).

SD5-XCU-ERN-97 is applied, updating the SYNOPSIS.

SD5-XCU-ERN-99 is applied, clarifying the definition of the divert macro in the EXTENDED DESCRIPTION.

SD5-XCU-ERN-100 is applied, clarifying the definition of the syscmd macro in the EXTENDED DESCRIPTION.

SD5-XCU-ERN-101 is applied, clarifying the definition of the undivert macro in the EXTENDED DESCRIPTION.

SD5-XCU-ERN-111 is applied to the EXTENDED DESCRIPTION, clarifying that the string "${" produces unspecified behavior.

SD5-XCU-ERN-112 is applied, updating the changequote macro.

SD5-XCU-ERN-118 is applied, clarifying the definition of the define macro in the EXTENDED DESCRIPTION and APPLICATION USAGE sections.

SD5-XCU-ERN-119 is applied, clarifying the definition of the translit macro in the EXTENDED DESCRIPTION and RATIONALE sections.

SD5-XCU-ERN-130, SD5-XCU-ERN-131, and SD5-XCU-ERN-137 are applied.

The m4 utility is moved from the XSI option to the Base.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0117 [241], XCU/TC1-2008/0118 [242,431], XCU/TC1-2008/0119 [242,431], and XCU/TC1-2008/0120 [325,430] are applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0117 [964], XCU/TC2-2008/0118 [970], and XCU/TC2-2008/0119 [964] are applied.

Issue 8

Austin Group Defect 251 is applied, encouraging implementations to report an error if a utility is directed to display a pathname that contains any bytes that have the encoded value of a <newline> character when <newline> is a terminator or separator in the output format being used.

Austin Group Defect 984 is applied, requiring that the exit status of m4 is non-zero when m4exit is called with a first argument of zero or with no arguments after an error has occurred.

Austin Group Defect 1072 is applied, clarifying the handling of macro arguments.

Austin Group Defect 1122 is applied, changing the description of NLSPATH .

Austin Group Defect 1330 is applied, removing the obsolescent maketemp macro.

Austin Group Defect 1514 is applied, changing the RATIONALE section to use the same terminology as the normative text to which it refers.

Austin Group Defect 1570 is applied, removing extra spacing in "==".

Austin Group Defect 1658 is applied, changing "whitespace characters" to "white-space characters".

Austin Group Defect 1730 is applied, adding square brackets around file... in the SYNOPSIS.

End of informative text.

 

return to top of page

UNIX® is a registered Trademark of The Open Group.
POSIX™ is a Trademark of The IEEE.
Copyright © 2001-2024 The IEEE and The Open Group, All Rights Reserved
[ Main Index | XBD | XSH | XCU | XRAT ]