This started as an ATNA-Syslog question, that I will consider in the context of some DICOM history.
Should Syslog have an appliction ack to deal with some known vulnerabilities? I call them vulnerabilities because they are extremely unlikely except when combined with intentional external actions. These vulnerabilities become apparent with mobile equipment and with skilled attackers.
DICOM has used application ack since it's introduction. But, this was for application reasons, not to deal with vulnerabilities in the underlying TCP and TLS services.
- In C-MOVE there is a C-MOVE-REQ that says "please accept object X". The corresponding C-MOVE-RSP says either "yes, got it" or "no, this is why not." Most often the no is because the responder is out of storage space, but it can also be due other problems.
- In C-FIND the pair is more complex. C-FIND-REQ says "please provide information on objects that match these criteria". The C-FIND-RSP normally conveys the information requested. It can also deal with "and there is more to come later" or "this is everything". The "no, this is why not" deals with invalid requests and other problems.
The use of application ack to deal with network problems as part of failure management was a side effect. These are just another kind of failure to be reported. The state machine for the application layer has to deal with network failures as part of completeness, not as a reason for application ack.
DICOM also tolerates a certain level of indeterminacy. The designers basically said "close enough", much like the designers of CRC and FEC codes. These cover most of the possible errors, and accept that a few will sneak through and be dealt with elsewhere.
The "elsewhere" in DICOM leads back towards the isues with Syslog and ATNA. DICOM makes transactions idempotent to the maximum extent possible.
The C-MOV transaction is idempotent. These is no end state difference between "C-MOV object X" once and "C-MOV object X" one hundred times. The only difference is the time it takes. So when in doubt or uncertain, DICOM applications just do it again. The second time probably reaches a determined state. Similarly the C-FIND is idempotent. A look at operational DICOM logs shows well over 99.9% of the transactions performed are idempotent.
There are some necessary exceptions, like "print". Sending "print" once is different from sending "print" a hundred times. You get a lot more copies printed. DICOM tries to take the non-idempotent applications and split them into an idempotent part and non-idempotent part. DICOM puts as much of the application into the idempotent part as it can. In the case of "print", everything except the N-ACTION-PRINT is idempotent. This minimizes the window of vulnerability. DICOM then took the attitude, "close enough, maybe there is the occasional duplicate print when something goes wrong. We'll accept that. It won't happen often enough to be a problem."
Syslog and ATNA have the issue that:
- There is no application level ack, and
- There is a delivery uncertainty in the face of some kinds of errors that occur with mobile devices and skilled attackers.
Can idempotency deal with this? It can if syslog messages are designed properly. This would allow a gradual transition to reliability without requiring changes to the underlying syslog protocols. The application change would be to send extras when uncertain about delivery. This is probably also a much smaller network impact also. There is extra traffic only in those error situations, not during normal traffic.
This takes messages that are universally unique over all time and sources, and that are idempotent. The idempotency allows sending duplicates. Duplicates can be recognized and discarded by recipients. (This also simplifies some of the multiple database and dispersed database issues for log processing.)
Unique IDs deal with uniqueness. DICOM uses unique IDs for all sorts of things. The hardest problem with unique IDs is persuading all the hot shot programmers that they really should read the recommendations on best practices for creating unique IDs. The home grown unique ID algorithms all seem to fall into one or another of the well known traps that result in non-unique ID generation.
If the lead-in to every syslog message body includes that unique ID for that message, you may be done. The rest is ensuring that the generating application doesn't generate multiple messages for the same event. That's bad design in any case. You also want to avoid non-idempotent content. This should be easy for ATNA and syslog. The message is describing an event. It shouldn't be that hard to make these messages idempotent.
The one idempotency trap that I've seen with log messages is the use of incremental messages. These need sequence integrity. The idempotence of meaning is lost if the messages are processed in the wrong order. Time tags and other tools can be used to preserve sequence integrity despite repeats and out of order arrival. But it helps a lot if the syslog messages are designed to be a self-contained complete description of the event of interest.
How well did RFC-3881 and DICOM do when defining audit messages? So-so. The event identification mandates identifying a source, date-time, etc. It does not require that these be message unique. The messages are fully self-contained. What they need is a unique ID.
I also checked the MITRE CEE effort. They don't mandate idempotent messages either.
So, should I propose a change to add an optional unique ID? Something to think about.
Comments