This explains the vulnerability mentioned previously. The vulnerability is an uncertainty by a sending application regarding whether messages have been received or not. This is not directly a security vulnerability. It's inherent in the TCP/TLS design, and normally is only an issue with mobile devices.
The real world sequence of events is this:
- Application write/sends X bytes to other side of TCP connection.
- Network is disconnected.
- Application gets a "Reset" or similar error from read/write/close operation.
Details of the system I/O functions don't matter. In all these cases, the application is left uncertain whether the X bytes were received or not. The "Reset" and related errors are an indication that something went wrong. Previously acknowledged messages might have still been in buffers, in transit, etc. Some of them might have been lost. TCP itself cannot know, because it is possible that the network disconnect prevented a TCP acknowledgement from arriving.
The application mest decide whether it cares. Possible situations range from:
- Everything made it just fine and was acknowledged. This is quite common in situations where there was a long delay between the last send and a close(). Odds are that the network disconnect is indirectly related to the close and occurred long after the TCP transmissions were completed.
- Everything in the TCP window could have been lost. This can happen when the network is lost during a large transmission. TCP has all this data buffered and is trying to get it through. It's using timeouts, retries, etc. If the network failure is just a transient, it will succeed. But, in this example, a "reset" indicates that it cannot succeed and has failed. There might be a complete TCP window's worth of buffers that were previously accepted but did not get transmitted.
There are also a variety of intermediate states possible that the application could determine.
This is where application ack is sometimes proposed as a solution. It's initially appealing, but it's still imperfect. The ack could be the only part of the transmission that was lost. Application acks do make sense in a number of application contexts. They don't automatically solve the underlying problem. More protocol design will be needed, with idempotency or transaction rollback dealing with parts of the problem.
This kind of timing occurs naturally with mobile devices. The form that I see most often is a real world situation where the user finished a task with a mobile device, disconnects it, and moves on the next task. Sometimes the mobile device is still transmitting results over the network when this happens. The disconnect can be a cable disconnect or simply moving the device out of WiFi range.
Asking users to wait does not work. They may understand the issue and try, but in practice it's like asking people not to stub their toe or spill their coffee. They understand and try, but it still happens. It's more realistic to design a device that survives these occasional problems.
So, when the client 'decides' that something bad happened, then it should abortively-reset the TCP connection. If the network is still intact, this will notify the service. If it is not intact, no harm done; but also anything that is queued can be freed quickly.
Then as you indicate, just send whatever you think didn't make it, which might be everything.
Posted by: John | January 02, 2012 at 08:26 PM