How can I force a socket to send the data in its buffer?
From Richard Stevens (email@example.com):
You can't force it. Period. TCP makes up its own mind as to when it
can send data. Now, normally when you call write() on a TCP socket,
TCP will indeed send a segment, but there's no guarantee and no way to
force this. There are lots of reasons why TCP will not send a
segment: a closed window and the Nagle algorithm are two things to
come immediately to mind.
(Snipped suggestion from Andrew Gierth to use TCP_NODELAY)
Setting this only disables one of the many tests, the Nagle algorithm.
But if the original poster's problem is this, then setting this socket
option will help.
A quick glance at tcp_output() shows around 11 tests TCP has to make
as to whether to send a segment or not.
Now from Dr. Charles E. Campbell Jr. (firstname.lastname@example.org):
As you've surmised, I've never had any problem with disabling Nagle's
algorithm. Its basically a buffering method; there's a fixed overhead
for all packets, no matter how small. Hence, Nagle's algorithm
collects small packets together (no more than .2sec delay) and thereby
reduces the amount of overhead bytes being transferred. This approach
works well for rcp, for example: the .2 second delay isn't humanly
noticeable, and multiple users have their small packets more
efficiently transferred. Helps in university settings where most
folks using the network are using standard tools such as rcp and ftp,
and programs such as telnet may use it, too.
However, Nagle's algorithm is pure havoc for real-time control and not
much better for keystroke interactive applications (control-C,
anyone?). It has seemed to me that the types of new programs using
sockets that people write usually do have problems with small packet
delays. One way to bypass Nagle's algorithm selectively is to use
"out-of-band" messaging, but that is limited in its content and has
other effects (such as a loss of sequentiality) (by the way, out-of-
band is often used for that ctrl-C, too).
More from Vic:
So to sum it all up, if you are having trouble and need to flush the
socket, setting the TCP_NODELAY option will usually solve the problem.
If it doesn't, you will have to use out-of-band messaging, but
according to Andrew, "out-of-band data has its own problems, and I
don't think it works well as a solution to buffering delays (haven't
tried it though). It is not 'expedited data' in the sense that exists
in some other protocols; it is transmitted in-stream, but with a
pointer to indicate where it is."
I asked Andrew something to the effect of "What promises does TCP make
about when it will get around to writing data to the network?" I
thought his reply should be put under this question:
Not many promises, but some.
I'll try and quote chapter and verse on this:
RFC 1122, "Requirements for Internet Hosts" (also STD 3)
RFC 793, "Transmission Control Protocol" (also STD 7)
1. The socket interface does not provide access to the TCP PUSH flag.
2. RFC1122 says (220.127.116.11):
A TCP MAY implement PUSH flags on SEND calls. If PUSH flags are
not implemented, then the sending TCP: (1) must not buffer data
indefinitely, and (2) MUST set the PSH bit in the last buffered
segment (i.e., when there is no more queued data to be sent).
3. RFC793 says (2.8):
When a receiving TCP sees the PUSH flag, it must not wait for more
data from the sending TCP before passing the data to the receiving
[RFC1122 supports this statement.]
4. Therefore, data passed to a write() call must be delivered to the
peer within a finite time, unless prevented by protocol
5. There are (according to a post from Stevens quoted in the FAQ
[earlier in this answer - Vic]) about 11 tests made which could
delay sending the data. But as I see it, there are only 2 that are
significant, since things like retransmit backoff are a) not under
the programmers control and b) must either resolve within a finite
time or drop the connection.
The first of the interesting cases is "window closed" (ie. there is
no buffer space at the receiver; this can delay data indefinitely, but
only if the receiving process is not actually reading the data that is
OK, it makes sense that if the client isn't reading, the data isn't
going to make it across the connection. I take it this causes the
sender to block after the recieve queue is filled?
The sender blocks when the socket send buffer is full, so buffers will
be full at both ends.
While the window is closed, the sending TCP sends window probe
packets. This ensures that when the window finally does open again,
the sending TCP detects the fact. [RFC1122, ss 18.104.22.168]
The second interesting case is "Nagle algorithm" (small segments, e.g.
keystrokes, are delayed to form larger segments if ACKs are expected
from the peer; this is what is disabled with TCP_NODELAY)
Does this mean that my tcpclient sample should set TCP_NODELAY to
ensure that the end-of-line code is indeed put out onto the network
No. tcpclient.c is doing the right thing as it stands; trying to write
as much data as possible in as few calls to write() as is feasible.
Since the amount of data is likely to be small relative to the socket
send buffer, then it is likely (since the connection is idle at that
point) that the entire request will require only one call to write(),
and that the TCP layer will immediately dispatch the request as a
single segment (with the PSH flag, see point 2.2 above).
The Nagle algorithm only has an effect when a second write() call is
made while data is still unacknowledged. In the normal case, this data
will be left buffered until either: a) there is no unacknowledged
data; or b) enough data is available to dispatch a full-sized segment.
The delay cannot be indefinite, since condition (a) must become true
within the retransmit timeout or the connection dies.
Since this delay has negative consequences for certain applications,
generally those where a stream of small requests are being sent
without response, e.g. mouse movements, the standards specify that an
option must exist to disable it. [RFC1122, ss 22.214.171.124]
Additional note: RFC1122 also says:
When the PUSH flag is not implemented on SEND calls, i.e., when
the application/TCP interface uses a pure streaming model,
responsibility for aggregating any tiny data fragments to form
reasonable sized segments is partially borne by the application
So programs should avoid calls to write() with small data lengths
(small relative to the MSS, that is); it's better to build up a
request in a buffer and then do one call to sock_write() or
The other possible sources of delay in the TCP are not really
controllable by the program, but they can only delay the data
By temporarily, you mean that the data will go as soon as it can, and
I won't get stuck in a position where one side is waiting on a
response, and the other side hasn't recieved the request? (Or at
least I won't get stuck forever)
You can only deadlock if you somehow manage to fill up all the buffers
in both directions... not easy.
If it is possible to do this, (can't think of a good example though),
the solution is to use nonblocking mode, especially for writes. Then
you can buffer excess data in the program as necessary.