A socket is an endpoint for communication using the facilities described in this section. A socket is created with a specific socket type, described in Socket Types , and is associated with a specific protocol, detailed in Protocols. A socket is accessed via a file descriptor obtained when the socket is created.
All network protocols are associated with a specific address family. An address family provides basic services to the protocol implementation to allow it to function within a specific network environment. These services may include packet fragmentation and reassembly, routing, addressing, and basic transport. An address family is normally comprised of a number of protocols, one per socket type. Each protocol is characterized by an abstract socket type. It is not required that an address family support all socket types. An address family may contain multiple protocols supporting the same socket abstraction.
Use of Sockets for Local UNIX Connections , Use of Sockets over Internet Protocols Based on IPv4 , and Use of Sockets over Internet Protocols Based on IPv6 , respectively, describe the use of sockets for local UNIX connections, for Internet protocols based on IPv4, and for Internet protocols based on IPv6.
An address family defines the format of a socket address. All network addresses are described using a general structure, called a sockaddr, as defined in the Base Definitions volume of IEEE Std 1003.1-2001, <sys/socket.h>. However, each address family imposes finer and more specific structure, generally defining a structure with fields specific to the address family. The field sa_family in the sockaddr structure contains the address family identifier, specifying the format of the sa_data area. The size of the sa_data area is unspecified.
A protocol supports one of the socket abstractions detailed in Socket Types. Selecting a protocol involves specifying the address family, socket type, and protocol number to the socket() function. Certain semantics of the basic socket abstractions are protocol-specific. All protocols are expected to support the basic model for their particular socket type, but may, in addition, provide non-standard facilities or extensions to a mechanism.
Sockets provides packet routing facilities. A routing information database is maintained, which is used in selecting the appropriate network interface when transmitting packets.
Each network interface in a system corresponds to a path through which messages can be sent and received. A network interface usually has a hardware device associated with it, though certain interfaces such as the loopback interface, do not.
A socket is created with a specific type, which defines the communication semantics and which allows the selection of an appropriate communication protocol. Four types are defined: [RS] SOCK_RAW, SOCK_STREAM, SOCK_SEQPACKET, and SOCK_DGRAM. Implementations may specify additional socket types.
The SOCK_STREAM socket type provides reliable, sequenced, full-duplex octet streams between the socket and a peer to which the socket is connected. A socket of type SOCK_STREAM must be in a connected state before any data may be sent or received. Record boundaries are not maintained; data sent on a stream socket using output operations of one size may be received using input operations of smaller or larger sizes without loss of data. Data may be buffered; successful return from an output function does not imply that the data has been delivered to the peer or even transmitted from the local system. If data cannot be successfully transmitted within a given time then the connection is considered broken, and subsequent operations shall fail. A SIGPIPE signal is raised if a thread sends on a broken stream (one that is no longer connected). Support for an out-of-band data transmission facility is protocol-specific.
The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and is also connection-oriented. The only difference between these types is that record boundaries are maintained using the SOCK_SEQPACKET type. A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers parts of more than one record. Record boundaries are visible to the receiver via the MSG_EOR flag in the received message flags returned by the recvmsg() function. It is protocol-specific whether a maximum record size is imposed.
The SOCK_DGRAM socket type supports connectionless data transfer which is not necessarily acknowledged or reliable. Datagrams may be sent to the address specified (possibly multicast or broadcast) in each output operation, and incoming datagrams may be received from multiple sources. The source address of each datagram is available when receiving the datagram. An application may also pre-specify a peer address, in which case calls to output functions shall send to the pre-specified peer. If a peer has been specified, only datagrams from that peer shall be received. A datagram must be sent in a single output operation, and must be received in a single input operation. The maximum size of a datagram is protocol-specific; with some protocols, the limit is implementation-defined. Output datagrams may be buffered within the system; thus, a successful return from an output function does not guarantee that a datagram is actually sent or received. However, implementations should attempt to detect any errors possible before the return of an output function, reporting any error by an unsuccessful return value.
[RS] The SOCK_RAW socket type is similar to the SOCK_DGRAM type. It differs in that it is normally used with communication providers that underlie those used for the other socket types. For this reason, the creation of a socket with type SOCK_RAW shall require appropriate privilege. The format of datagrams sent and received with this socket type generally include specific protocol headers, and the formats are protocol-specific and implementation-defined.
The I/O mode of a socket is described by the O_NONBLOCK file status flag which pertains to the open file description for the socket. This flag is initially off when a socket is created, but may be set and cleared by the use of the F_SETFL command of the fcntl() function.
When the O_NONBLOCK flag is set, functions that would normally block until they are complete shall either return immediately with an error, or shall complete asynchronously to the execution of the calling process. Data transfer operations (the read(), write(), send(), and recv() functions) shall complete immediately, transfer only as much as is available, and then return without blocking, or return an error indicating that no transfer could be made without blocking. The connect() function initiates a connection and shall return without blocking when O_NONBLOCK is set; it shall return the error [EINPROGRESS] to indicate that the connection was initiated successfully, but that it has not yet completed.
The owner of a socket is unset when a socket is created. The owner may be set to a process ID or process group ID using the F_SETOWN command of the fcntl() function.
The transmit and receive queue sizes for a socket are set when the socket is created. The default sizes used are both protocol-specific and implementation-defined. The sizes may be changed using the setsockopt() function.
Errors may occur asynchronously, and be reported to the socket in response to input from the network protocol. The socket stores the pending error to be reported to a user of the socket at the next opportunity. The error is returned in response to a subsequent send(), recv(), or getsockopt() operation on the socket, and the pending error is then cleared.
A socket has a receive queue that buffers data when it is received by the system until it is removed by a receive call. Depending on the type of the socket and the communication provider, the receive queue may also contain ancillary data such as the addressing and other protocol data associated with the normal data in the queue, and may contain out-of-band or expedited data. The limit on the queue size includes any normal, out-of-band data, datagram source addresses, and ancillary data in the queue. The description in this section applies to all sockets, even though some elements cannot be present in some instances.
The contents of a receive buffer are logically structured as a series of data segments with associated ancillary data and other information. A data segment may contain normal data or out-of-band data, but never both. A data segment may complete a record if the protocol supports records (always true for types SOCK_SEQPACKET and SOCK_DGRAM). A record may be stored as more than one segment; the complete record might never be present in the receive buffer at one time, as a portion might already have been returned to the application, and another portion might not yet have been received from the communications provider. A data segment may contain ancillary protocol data, which is logically associated with the segment. Ancillary data is received as if it were queued along with the first normal data octet in the segment (if any). A segment may contain ancillary data only, with no normal or out-of-band data. For the purposes of this section, a datagram is considered to be a data segment that terminates a record, and that includes a source address as a special type of ancillary data. Data segments are placed into the queue as data is delivered to the socket by the protocol. Normal data segments are placed at the end of the queue as they are delivered. If a new segment contains the same type of data as the preceding segment and includes no ancillary data, and if the preceding segment does not terminate a record, the segments are logically merged into a single segment.
The receive queue is logically terminated if an end-of-file indication has been received or a connection has been terminated. A segment shall be considered to be terminated if another segment follows it in the queue, if the segment completes a record, or if an end-of-file or other connection termination has been reported. The last segment in the receive queue shall also be considered to be terminated while the socket has a pending error to be reported.
A receive operation shall never return data or ancillary data from more than one segment.
The handling of received out-of-band data is protocol-specific. Out-of-band data may be placed in the socket receive queue, either at the end of the queue or before all normal data in the queue. In this case, out-of-band data is returned to an application program by a normal receive call. Out-of-band data may also be queued separately rather than being placed in the socket receive queue, in which case it shall be returned only in response to a receive call that requests out-of-band data. It is protocol-specific whether an out-of-band data mark is placed in the receive queue to demarcate data preceding the out-of-band data and following the out-of-band data. An out-of-band data mark is logically an empty data segment that cannot be merged with other segments in the queue. An out-of-band data mark is never returned in response to an input operation. The sockatmark() function can be used to test whether an out-of-band data mark is the first element in the queue. If an out-of-band data mark is the first element in the queue when an input function is called without the MSG_PEEK option, the mark is removed from the queue and the following data (if any) is processed as if the mark had not been present.
Sockets that are used to accept incoming connections maintain a queue of outstanding connection indications. This queue is a list of connections that are awaiting acceptance by the application; see listen().
One category of event at the socket interface is the generation of signals. These signals report protocol events or process errors relating to the state of the socket. The generation or delivery of a signal does not change the state of the socket, although the generation of the signal may have been caused by a state change.
The SIGPIPE signal shall be sent to a thread that attempts to send data on a socket that is no longer able to send. In addition, the send operation fails with the error [EPIPE].
If a socket has an owner, the SIGURG signal is sent to the owner of the socket when it is notified of expedited or out-of-band data. The socket state at this time is protocol-dependent, and the status of the socket is specified in Use of Sockets for Local UNIX Connections , Use of Sockets over Internet Protocols Based on IPv4 , and Use of Sockets over Internet Protocols Based on IPv6. Depending on the protocol, the expedited data may or may not have arrived at the time of signal generation.
If any of the following conditions occur asynchronously for a socket, the corresponding value listed below shall become the pending error for the socket:
There are a number of socket options which either specialize the behavior of a socket or provide useful information. These options may be set at different protocol levels and are always present at the uppermost "socket" level.
Socket options are manipulated by two functions, getsockopt() and setsockopt(). These functions allow an application program to customize the behavior and characteristics of a socket to provide the desired effect.
All of the options have default values. The type and meaning of these values is defined by the protocol level to which they apply. Instead of using the default values, an application program may choose to customize one or more of the options. However, in the bulk of cases, the default values are sufficient for the application.
Some of the options are used to enable or disable certain behavior within the protocol modules (for example, turn on debugging) while others may be used to set protocol-specific information (for example, IP time-to-live on all the application's outgoing packets). As each of the options is introduced, its effect on the underlying protocol modules is described.
Value of Level for Socket Options shows the value for the socket level.
Name |
Description |
---|---|
SOL_SOCKET |
Options are intended for the sockets level. |
Socket-Level Options lists those options present at the socket level; that is, when the level
parameter of the getsockopt() or setsockopt() function is SOL_SOCKET, the types of the option value parameters associated
with each option, and a brief synopsis of the meaning of the option value parameter. Unless otherwise noted, each may be examined
with getsockopt() and set with setsockopt() on all types of socket.
Option |
Parameter Type |
Parameter Meaning |
---|---|---|
SO_BROADCAST |
int |
Non-zero requests permission to transmit broadcast datagrams (SOCK_DGRAM sockets only). |
SO_DEBUG |
int |
Non-zero requests debugging in underlying protocol modules. |
SO_DONTROUTE |
int |
Non-zero requests bypass of normal routing; route based on destination address only. |
SO_ERROR |
int |
Requests and clears pending error information on the socket ( getsockopt() only). |
SO_KEEPALIVE |
int |
Non-zero requests periodic transmission of keepalive messages (protocol-specific). |
SO_LINGER |
struct linger |
Specify actions to be taken for queued, unsent data on close(): linger on/off and linger time in seconds. |
SO_OOBINLINE |
int |
Non-zero requests that out-of-band data be placed into normal data input queue as received. |
SO_RCVBUF |
int |
Size of receive buffer (in bytes). |
SO_RCVLOWAT |
int |
Minimum amount of data to return to application for input operations (in bytes). |
SO_RCVTIMEO |
struct timeval |
Timeout value for a socket receive operation. |
SO_REUSEADDR |
int |
Non-zero requests reuse of local addresses in bind() (protocol-specific). |
SO_SNDBUF |
int |
Size of send buffer (in bytes). |
SO_SNDLOWAT |
int |
Minimum amount of data to send for output operations (in bytes). |
SO_SNDTIMEO |
struct timeval |
Timeout value for a socket send operation. |
SO_TYPE |
int |
Identify socket type ( getsockopt() only). |
The SO_BROADCAST option requests permission to send broadcast datagrams on the socket. Support for SO_BROADCAST is protocol-specific. The default for SO_BROADCAST is that the ability to send broadcast datagrams on a socket is disabled.
The SO_DEBUG option enables debugging in the underlying protocol modules. This can be useful for tracing the behavior of the underlying protocol modules during normal system operation. The semantics of the debug reports are implementation-defined. The default value for SO_DEBUG is for debugging to be turned off.
The SO_DONTROUTE option requests that outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. It is protocol-specific whether this option has any effect and how the outgoing network interface is chosen. Support for this option with each protocol is implementation-defined.
The SO_ERROR option is used only on getsockopt(). When this option is specified, getsockopt() shall return any pending error on the socket and clear the error status. It shall return a value of 0 if there is no pending error. SO_ERROR may be used to check for asynchronous errors on connected connectionless-mode sockets or for other types of asynchronous errors. SO_ERROR has no default value.
The SO_KEEPALIVE option enables the periodic transmission of messages on a connected socket. The behavior of this option is protocol-specific. The default value for SO_KEEPALIVE is zero, specifying that this capability is turned off.
The SO_LINGER option controls the action of the interface when unsent messages are queued on a socket and a close() is performed. The details of this option are protocol-specific. The default value for SO_LINGER is zero, or off, for the l_onoff element of the option value and zero seconds for the linger time specified by the l_linger element.
The SO_OOBINLINE option is valid only on protocols that support out-of-band data. The SO_OOBINLINE option requests that out-of-band data be placed in the normal data input queue as received; it is then accessible using the read() or recv() functions without the MSG_OOB flag set. The default for SO_OOBINLINE is off; that is, for out-of-band data not to be placed in the normal data input queue.
The SO_RCVBUF option requests that the buffer space allocated for receive operations on this socket be set to the value, in bytes, of the option value. Applications may wish to increase buffer size for high volume connections, or may decrease buffer size to limit the possible backlog of incoming data. The default value for the SO_RCVBUF option value is implementation-defined, and may vary by protocol.
The maximum value for the option for a socket may be obtained by the use of the fpathconf() function, using the value _PC_SOCK_MAXBUF.
The SO_RCVLOWAT option sets the minimum number of bytes to process for socket input operations. In general, receive calls block until any (non-zero) amount of data is received, then return the smaller of the amount available or the amount requested. The default value for SO_RCVLOWAT is 1, and does not affect the general case. If SO_RCVLOWAT is set to a larger value, blocking receive calls normally wait until they have received the smaller of the low water mark value or the requested amount. Receive calls may still return less than the low water mark if an error occurs, a signal is caught, or the type of data next in the receive queue is different from that returned (for example, out-of-band data). As mentioned previously, the default value for SO_RCVLOWAT is 1 byte. It is implementation-defined whether the SO_RCVLOWAT option can be set.
The SO_RCVTIMEO option is an option to set a timeout value for input operations. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno shall be set to [EWOULDBLOCK] if no data were received. The default for this option is the value zero, which indicates that a receive operation will not time out. It is implementation-defined whether the SO_RCVTIMEO option can be set.
The SO_REUSEADDR option indicates that the rules used in validating addresses supplied in a bind() should allow reuse of local addresses. Operation of this option is protocol-specific. The default value for SO_REUSEADDR is off; that is, reuse of local addresses is not permitted.
The SO_SNDBUF option requests that the buffer space allocated for send operations on this socket be set to the value, in bytes, of the option value. The default value for the SO_SNDBUF option value is implementation-defined, and may vary by protocol. The maximum value for the option for a socket may be obtained by the use of the fpathconf() function, using the value _PC_SOCK_MAXBUF.
The SO_SNDLOWAT option sets the minimum number of bytes to process for socket output operations. Most output operations process all of the data supplied by the call, delivering data to the protocol for transmission and blocking as necessary for flow control. Non-blocking output operations process as much data as permitted subject to flow control without blocking, but process no data if flow control does not allow the smaller of the send low water mark value or the entire request to be processed. A select() operation testing the ability to write to a socket shall return true only if the send low water mark could be processed. The default value for SO_SNDLOWAT is implementation-defined and protocol-specific. It is implementation-defined whether the SO_SNDLOWAT option can be set.
The SO_SNDTIMEO option is an option to set a timeout value for the amount of time that an output function shall block because flow control prevents data from being sent. As noted in Socket-Level Options , the option value is a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an output operation to complete. If a send operation has blocked for this much time, it shall return with a partial count or errno set to [EWOULDBLOCK] if no data were sent. The default for this option is the value zero, which indicates that a send operation will not time out. It is implementation-defined whether the SO_SNDTIMEO option can be set.
The SO_TYPE option is used only on getsockopt(). When this option is specified, getsockopt() shall return the type of the socket (for example, SOCK_STREAM). This option is useful to servers that inherit sockets on start-up. SO_TYPE has no default value.
Support for UNIX domain sockets is mandatory.
UNIX domain sockets provide process-to-process communication in a single system.
The symbolic constant AF_UNIX defined in the <sys/socket.h> header is used to identify the UNIX domain address family. The <sys/un.h> header contains other definitions used in connection with UNIX domain sockets. See the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_un structure (see the <sys/un.h> header defined in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_un structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_un structure, the ss_family field maps onto the sun_family field.
When a socket is created in the Internet family with a protocol value of zero, the implementation shall use the protocol listed below for the type of socket created.
[RS] A raw interface to IP is available by creating an Internet socket of type SOCK_RAW. The default protocol for type SOCK_RAW shall be identified in the IP header with the value IPPROTO_RAW. Applications should not use the default protocol when creating a socket with type SOCK_RAW, but should identify a specific protocol by value. The ICMP control protocol is accessible from a raw socket by specifying a value of IPPROTO_ICMP for protocol.
Support for sockets over Internet protocols based on IPv4 is mandatory.
The symbolic constant AF_INET defined in the <sys/socket.h> header is used to identify the IPv4 Internet address family. The <netinet/in.h> header contains other definitions used in connection with IPv4 Internet sockets. See the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_in structure (see the <netinet/in.h> header defined in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_in structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_in structure, the ss_family field maps onto the sin_family field.
[IP6] This section describes extensions to support sockets over Internet protocols based on IPv6. This functionality is dependent on support of the IPV6 option (and the rest of this section is not further marked for this option).
To enable smooth transition from IPv4 to IPv6, the features defined in this section may, in certain circumstances, also be used in connection with IPv4; see Compatibility with IPv4.
IPv6 overcomes the addressing limitations of previous versions by using 128-bit addresses instead of 32-bit addresses. The IPv6 address architecture is described in RFC 2373.
There are three kinds of IPv6 address:
A unicast address can be global, link-local (designed for use on a single link), or site-local (designed for systems not connected to the Internet). Link-local and site-local addresses need not be globally unique.
An anycast address is similar to a unicast address; the nodes to which an anycast address is assigned must be explicitly configured to know that it is an anycast address.
An application can send multicast datagrams by simply specifying an IPv6 multicast address in the address argument of sendto(). To receive multicast datagrams, an application must join the multicast group (using setsockopt() with IPV6_JOIN_GROUP) and must bind to the socket the UDP port on which datagrams will be received. Some applications should also bind the multicast group address to the socket, to prevent other datagrams destined to that port from being delivered to the socket.
A multicast address can be global, node-local, link-local, site-local, or organization-local.
The following special IPv6 addresses are defined:
Two sets of IPv6 addresses are defined to correspond to IPv4 addresses:
Note that the unspecified address and the loopback address must not be treated as IPv4-compatible addresses.
The API provides the ability for IPv6 applications to interoperate with applications using IPv4, by using IPv4-mapped IPv6 addresses. These addresses can be generated automatically by the getaddrinfo() function when the specified host has only IPv4 addresses.
Applications can use AF_INET6 sockets to open TCP connections to IPv4 nodes, or send UDP packets to IPv4 nodes, by simply encoding the destination's IPv4 address as an IPv4-mapped IPv6 address, and passing that address, within a sockaddr_in6 structure, in the connect(), sendto(), or sendmsg() function. When applications use AF_INET6 sockets to accept TCP connections from IPv4 nodes, or receive UDP packets from IPv4 nodes, the system shall return the peer's address to the application in the accept(), recvfrom(), recvmsg(), or getpeername() function using a sockaddr_in6 structure encoded this way. If a node has an IPv4 address, then the implementation shall allow applications to communicate using that address via an AF_INET6 socket. In such a case, the address will be represented at the API by the corresponding IPv4-mapped IPv6 address. Also, the implementation may allow an AF_INET6 socket bound to in6addr_any to receive inbound connections and packets destined to one of the node's IPv4 addresses.
An application can use AF_INET6 sockets to bind to a node's IPv4 address by specifying the address as an IPv4-mapped IPv6 address in a sockaddr_in6 structure in the bind() function. For an AF_INET6 socket bound to a node's IPv4 address, the system shall return the address in the getsockname() function as an IPv4-mapped IPv6 address in a sockaddr_in6 structure.
Each local interface is assigned a unique positive integer as a numeric index. Indexes start at 1; zero is not used. There may be gaps so that there is no current interface for a particular positive index. Each interface also has a unique implementation-defined name.
The following options apply at the IPPROTO_IPV6 level:
An attempt to read this option using getsockopt() shall result in an [EOPNOTSUPP] error.
The parameter type of this option is a pointer to an ipv6_mreq structure.
An attempt to read this option using getsockopt() shall result in an [EOPNOTSUPP] error.
The parameter type of this option is a pointer to an ipv6_mreq structure.
The parameter type of this option is a pointer to an int. (Default value: 1)
The parameter type of this option is a pointer to an unsigned int. (Default value: 0)
The parameter type of this option is a pointer to an unsigned int which is used as a Boolean value. (Default value: 1)
The parameter type of this option is a pointer to an int. (Default value: Unspecified)
The parameter type of this option is a pointer to an int which is used as a Boolean value. (Default value: 0)
An [EOPNOTSUPP] error shall result if IPV6_JOIN_GROUP or IPV6_LEAVE_GROUP is used with getsockopt().
The symbolic constant AF_INET6 is defined in the <sys/socket.h> header to identify the IPv6 Internet address family. See the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_in6 structure (see the <netinet/in.h> header defined in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_in6 structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_in6 structure, the ss_family field maps onto the sin6_family field.
The <netinet/in.h>, <arpa/inet.h>, and <netdb.h> headers contain other definitions used in connection with IPv6 Internet sockets; see the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 13, Headers.