iconv

The Open Group Base Specifications Issue 8
IEEE Std 1003.1-2024
Copyright © 2001-2024 The IEEE and The Open Group

NAME

iconv — codeset conversion function

SYNOPSIS

#include <iconv.h> size_t iconv(iconv_t cd, char **restrict inbuf, size_t *restrict inbytesleft, char **restrict outbuf, size_t *restrict outbytesleft);

DESCRIPTION

The iconv() function shall convert the sequence of characters from one codeset, in the array specified by inbuf, into a sequence of corresponding characters in another codeset, in the array specified by outbuf. The codesets are those specified in the iconv_open() call that returned the conversion descriptor, cd. The inbuf argument points to a variable that points to the first character in the input buffer and inbytesleft indicates the number of bytes to the end of the buffer to be converted. The outbuf argument points to a variable that points to the first available byte in the output buffer and outbytesleft indicates the number of the available bytes to the end of the buffer.

For state-dependent encodings, the conversion descriptor cd is placed into its initial shift state by a call for which inbuf is a null pointer, or for which inbuf points to a null pointer. When iconv() is called in this way, and if outbuf is not a null pointer or a pointer to a null pointer, and outbytesleft points to a positive value, iconv() shall place, into the output buffer, the byte sequence to change the output buffer to its initial shift state. If the output buffer is not large enough to hold the entire reset sequence, iconv() shall fail and set errno to [E2BIG]. Subsequent calls with inbuf as other than a null pointer or a pointer to a null pointer cause the conversion to take place from the current state of the conversion descriptor.

If a sequence of input bytes does not form a valid character or shift sequence in the input codeset:

If the //IGNORE indicator suffix was specified when the conversion descriptor cd was opened and the byte sequence is immediately followed by a valid character or shift sequence, the sequence of bytes shall be discarded and conversion shall continue from the immediately following valid character or shift sequence. This shall not be treated as an error.

If the //IGNORE indicator suffix was not specified when the conversion descriptor cd was opened, conversion shall stop after the previous successfully converted character or shift sequence.

If the input buffer ends with an incomplete character or shift sequence, conversion shall stop after the previous successfully converted bytes. If the output buffer is not large enough to hold the entire converted input, conversion shall stop just prior to the input bytes that would cause the output buffer to overflow. The variable pointed to by inbuf shall be updated to point to the byte following the last byte successfully used in the conversion. The value pointed to by inbytesleft shall be decremented to reflect the number of bytes still not converted in the input buffer. The variable pointed to by outbuf shall be updated to point to the byte following the last byte of converted output data. The value pointed to by outbytesleft shall be decremented to reflect the number of bytes still available in the output buffer. For state-dependent encodings, the conversion descriptor shall be updated to reflect the shift state in effect at the end of the last successfully converted byte sequence.

If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the output codeset:

If either the //IGNORE or the //NON_IDENTICAL_DISCARD indicator suffix was specified when the conversion descriptor cd was opened, the character shall be discarded but shall still be counted in the return value of the iconv() call.

If the //TRANSLIT indicator suffix was specified when the conversion descriptor cd was opened, an implementation-defined transliteration shall be performed, if possible, to convert the character into one or more characters of the output codeset that best resemble the input character. The character shall be counted as one character in the return value of the iconv() call, regardless of the number of output characters.

If no indicator suffix was specified when the conversion descriptor cd was opened, or the //TRANSLIT indicator suffix was specified but no transliteration of the character is possible, iconv() shall perform an implementation-defined conversion on the character and it shall be counted in the return value of the iconv() call.

RETURN VALUE

The iconv() function shall update the variables pointed to by the arguments to reflect the extent of the conversion and shall return the number of input characters that could not be converted to an identical output character. If the entire string in the input buffer is converted, except for any byte sequences discarded as a result of the //IGNORE indicator suffix, the value pointed to by inbytesleft shall be 0. If the input conversion is stopped due to any conditions mentioned above, the value pointed to by inbytesleft shall be non-zero and errno shall be set to indicate the condition. If an error occurs, iconv() shall return (size_t)-1 and set errno to indicate the error.

ERRORS

The iconv() function shall fail if:

[EILSEQ]

Input conversion stopped due to an input byte that does not belong to the input codeset.

[E2BIG]

Input conversion stopped due to lack of space in the output buffer.

[EINVAL]

Input conversion stopped due to an incomplete character or shift sequence at the end of the input buffer.

The iconv() function may fail if:

[EBADF]

The cd argument is not a valid open conversion descriptor.

The following sections are informative.

EXAMPLES

None.

APPLICATION USAGE

The inbuf argument indirectly points to the memory area which contains the conversion input data. The outbuf argument indirectly points to the memory area which is to contain the result of the conversion. The objects indirectly pointed to by inbuf and outbuf are not restricted to containing data that is directly representable in the ISO C standard language char data type. The type of inbuf and outbuf, char **, does not imply that the objects pointed to are interpreted as null-terminated C strings or arrays of characters. Any interpretation of a byte sequence that represents a character in a given character set encoding scheme is done internally within the codeset converters. For example, the area pointed to indirectly by inbuf and/or outbuf can contain all zero octets that are not interpreted as string terminators but as coded character data according to the respective codeset encoding scheme. The type of the data (char, short, long, and so on) read or stored in the objects is not specified, but may be inferred for both the input and output data by the converters determined by the fromcode and tocode arguments of iconv_open().

Regardless of the data type inferred by the converter, the size of the remaining space in both input and output objects (the intbytesleft and outbytesleft arguments) is always measured in bytes.

For implementations that support the conversion of state-dependent encodings, the conversion descriptor must be able to accurately reflect the shift-state in effect at the end of the last successful conversion. It is not required that the conversion descriptor itself be updated, which would require it to be a pointer type. Thus, implementations are free to implement the descriptor as a handle (other than a pointer type) by which the conversion information can be accessed and updated.

It is the responsibility of the application to ensure that, if the output codeset has a locking-shift encoding, the output buffer is returned to its initial shift state when conversion is completed. This can be accomplished by calling iconv() with inbuf as a null pointer, or with inbuf pointing to a null pointer, before calling iconv_close(). Since the standard does not provide a way to query whether a codeset has a locking-shift encoding, it is recommended that applications always call iconv() in this way before calling iconv_close().

When the //IGNORE indicator suffix was used to open the conversion descriptor, iconv() does not provide any indication of whether any invalid input byte sequences were discarded. Applications which need to detect the discarding of invalid input byte sequences can open the conversion descriptor without using //IGNORE and then call iconv() in a loop such that if it returns an [EILSEQ] error, the application increments the variable pointed to by inbuf and decrements the variable pointed to by inbytesleft before the next call. This technique can also be used by applications which need to use //TRANSLIT but also discard invalid input byte sequences.

RATIONALE

None.

FUTURE DIRECTIONS

None.

CHANGE HISTORY

First released in Issue 4. Derived from the HP-UX Manual.

Issue 6

The SYNOPSIS has been corrected to align with the <iconv.h> reference page.

The restrict keyword is added to the iconv() prototype for alignment with the ISO/IEC 9899:1999 standard.

Issue 7

The iconv() function is moved from the XSI option to the Base.

Issue 8

Austin Group Defect 1007 is applied, adding support for indicator suffixes in the tocode argument to iconv_open().

Austin Group Defect 1008 is applied, adding a paragraph about locking-shift encodings to the APPLICATION USAGE section.

Austin Group Defect 1438 is applied, changing "valid character in the specified codeset" to "valid character in the specified input codeset".

End of informative text.

return to top of page

<<< Previous

Home

Next >>>

NAME

SYNOPSIS

DESCRIPTION

RETURN VALUE

ERRORS

EXAMPLES

APPLICATION USAGE

RATIONALE

FUTURE DIRECTIONS

SEE ALSO

CHANGE HISTORY

Issue 6

Issue 7

Issue 8