Table of Contents
What is archival in a medical records context? How, if at all, should standards address this issue?
Archival as a term can be mean different things to different people, and it is important to understand the motivations for archival. There can be multiple related motivations. In general they fall into overlapping categories:
- Administrative and regulatory demands. For example, the Massachusetts official archivist is responsible to ensure that legal records are preserved indefinitely. The legislative and property ownership/transfer records presently go back 400 years, and the archivist has been actively considering what should be done now so that new records will remain usable 400 years from now. Medical records are not usually of interest for that long, but there are regulatory requirements for preservation of medical records.
- Cost/Performance considerations. Medical systems, especially imaging systems, are very demanding data consumers. It is normal for a PACS system to utilize a mixture of storage techniques so that desired performance can be delivered at a lower cost. This routinely involves migration of data from more expensive faster storage to less expensive slower storage.
- Business continuity considerations. Medical systems are just as subject to the risks of fire, flood, and other disasters as any other data processing systems. Considerations of off-site storage, storage reliability, etc. apply. Organizational changes, bankruptcies, and other disruptions can destroy data held in trust for patients.
- Data management considerations. Medical records are subject to modification, both desired and undesired. Patient names, addresses, etc. can be incorrectly gathered when the record was collected and need changing. The need to control these changes, approving some and disallowing others, interacts with data storage in a variety of ways.
So when the statement is made "this data has been archived", you must know a lot more before you understand what was meant.
DICOM has so far stayed away from these issues, and merely nibbled at the edges. It provides one important piece of information that has so far been sufficient for operational use, and that has not been too controversial. In DICOM, you can obtain an estimate of how rapidly a document can be provided:
|-- DICOM Standard PS3.3 Section C.126.96.36.199|
The many different issues that motivate archival may explain DICOM’s choice to avoid attempts to standardize this highly complex area.
Most of the regulatory requirements are phrased quite simply:
This data shall be preserved for X years.
The rest is left as a system design problem. The issues that are typically noted by archivists include:
- Chemical/Internal degradation. Many archivists require acid-free paper and non-corrosive inks. Cheap paper and ink documents will self-destruct in a few decades.
- Environmental degradation. Corrosion, damp, heat, light, and other environmental factors can damage documents. Old magnetic media delaminates as the plasticizers evaporate from the backing. Fungus and insects may damage the media.
- Technology change. The machines needed to process the media may become unavailable. One of the issues that concerned the Massachusetts archivist was transferring legal documents off 8-inch floppies for use on a Wang word processing machine before the last of those machines became unservicable. NASA has lost old space records because the machines needed to read the 7-track tapes were no longer functional.
- Proprietary formats. This is a variation on the technology change. Even if the media and machines may exist, the old software or old formats might no longer be supported or available.
- Loss of supporting infrastructure. Technologies like DRM introduce necessary supporting infrastructure. If mandatory servers are shut down, or old versions not supported, documents may become unavailable. Adobe is shutting down some of their old DRM alternatives this year. Owners of book readers may find all their books unavailable when the DRM servers are shut down.
All of these are potential problems for medical records. This is one of the motivations for development of openly available standards. The standards can’t remove the degradation or environmental risks, but they can reduce the technological, proprietary format, and supporting infrastructure risks.
Cost and performance take several dimensions. The most common considerations are:
- Equipment cost vs speed.
- Electrical and operational cost vs capacity.
- Facility cost vs capacity
The levels of storage used in PACS systems illustrate the kind of trade offs needed for equipment cost versus speed. The goal here is to provide the fast response demanded by radiologists, while keeping costs down. You find a storage hierarchy results:
- RAM storage is used for the current and next few studies. There is typically 16-64 GB of RAM set aside for this in a radiology workstation.
- FAST RAID storage is used for current disk storage. A single ordinary disk drive is too slow.
- Local PACS storage is used for data expected to be needed soon.
- A central PACS storage is used for data that can tolerate some delays due to network transfer times.
- A multi-site PACS storage structure may be used and associated with patient movements and physician locations.
- A private cloud storage system may be used for bulk slower storage.
- A robotic tape, DVD, or Blu-ray archive may be used for bulk storage.
As the data volumes increase and the tolerance for delays increase, the private cloud and robotic systems come to dominate. Issues like electricity consumption and building volume requirements start to become an issue.
Physical protection is an issue, with consideration for floods, fire, and other disasters quite important. Multi-location storage for disaster survival becomes a consideration. Often the people will move to escape the disaster, raising the issue that their medical records must still be accessible. Since site recovery takes much longer than people relocation, having multiple storage locations becomes important.
Other issues like bankruptcy and mergers also place records at risk. Third party archives are one approach to managing these issues. So not only are there multiple location considerations, there are multiple organization considerations.
With all this data at multiple locations and under the control of multiple organizations, archival also introduces the considerations of data management. How are appropriate corrections made? How are unauthorized modifications prevented? How are incorrect changes corrected? Medical records systems are just beginning to consider these issues.
The data clouds and in particular the open source communities are starting to combine the needs of data management and business continuity to reduce costs and provide better service.
The Linux system is maintained on a highly distributed change management system using "git". Linus Torvalds has said that he no longer does backups of any of his work. Instead, when a disk fails, he just goes out to the distributed change management system and recovers from that. Since he commits his changes into the distributed system whenever he finishes each small task, the most he risks losing is the task that he was in the middle of doing when the failure occurred.
Less dramatically, the PACS systems supporting the UK NHS use multiple PACS central servers with fully redundant copies of data. If a system goes down, the others continue operation and update the down system when it is restored. The redundancy is not as fine grained as the Linux example. If a central server is completely cut off from all the other central servers, and then it is destroyed (perhaps by fire), the data gathered between the loss of connectivity and the fire would be lost. But that’s a deliberate risk assessment decision. They would rather continue medical operations and accept the risk that a subsequent event destroys the PACS system. It’s a low likelihood risk given the multiple network connectivity provided.
All of the very different meanings discussed above are called archival by different people. Some of those meanings clearly use a graduated scale of archival, not a binary event. Even the very limited operational bit standardized by DICOM had to create four categories of data, not the binary archived vs not archived.
Efforts to provide further standardization around archival must be done with a consideration of the very different uses for archival by different people. The use cases being standardized and the terminology used needs to be very carefully defined.