Fairhaven, The River

About

Recent Posts

  • There's meteorology in everything?
  • Tweaking the bar chart
  • Tracking carbs for diet management
  • Fitbit and R
  • Orgmode tables vs CSV files for R
  • Org-mode, R, and graphics
  • Org-mode dates and R dates
  • Firefox 35 fixes a highly
  • November Books Read
  • October Books Read
Subscribe to this blog's feed
Blog powered by Typepad

Archives

  • January 2016
  • December 2015
  • January 2015
  • December 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014

Categories

  • Arts (6)
  • Books (13)
  • Current Affairs (41)
  • Eco-policy (55)
  • Energy Tech (38)
  • Food and Drink (9)
  • Gift Economy (3)
  • Healthcare (46)
  • Politics (16)
  • Science (4)
  • Standards (33)
  • Travel (9)
  • Web/Tech (32)
See More

IHE Profiles, Actors, Transactions, Options and mixins

Table of Contents

1. Mixins explained
2. An IHE Product Example
3. Issues

The proper interpretation and use of IHE’s definitions for actors, options, profiles, and transactions is a regular source of discussion and sometimes confusion. This may come up again at next weeks meetings.

IHE Technical Frameworks define transactions, actors, options, and several kinds of profiles. These can be viewed as a kind of object oriented specification using "mixin" and "inheritance". Because these are interoperability specifications you don’t have the notational and functional specificity that is needed for a programming language. Conceptually, these specifications are like the mixins and inheritance in Flavors and CLOS (for us old farts) or Javascript, Perl, Ruby, and Python (for the younger crowd).

Each Actor, Transaction, Integration Profile, and Content Profile defines functionality that will be added to a Product, i.e., a "mixin". Rather than have precise executable code, these are requirements that are included by reference. Options add functionality that will be added to an IHE Actor, i.e., "inheritance". Some Actors are found in multiple profiles. This is a combination of common requirements that apply to all profiles, "mixin", with specific requirements that apply only when within one profile, "inheritance".

1. Mixins explained

Some languages (like C++ and Smalltalk) lack mixins. Mixins are independent chunks of functionality that can be added to an object rather than inherited from a parent object. This corresponds to how interchangeable parts work in the real world.

Consider an automobile maker like Toyota. They have five models of Prius (1-5). They have four engine types. They have two wheel sizes. They have three battery pack types. They have multiple model years. A Prius 1 can only have the small engine. The Prius 2-4 have the medium engine. The Prius 5 has the fancy engine. The Prius 1-4 have the NiMH battery pack. The Prius 5 has the lithium battery pack. The rules go on for all the alternative components. They use the same components for several model years, while changing other aspects of each model.

So Toyota makes five Prius models and has mixins for engine, wheels, battery packs, etc. As you expand to their entire product line you find commonality of wheels, engines, and other parts across a wide range of products.

When you look at vehicles in general you find that mixins extend much further. You can find the same Cummins engine in many products from different companies.

The definition of standard building blocks that can be used as components in different products is also valuable in software. IHE defines components so that the components of products that provide interoperability can have common specifications. This leaves the product creators the flexibility to innovate with product features without losing interoperability.

2. An IHE Product Example

Imagine a Product that is described as conforming to

  1. A Time Client within the Consistent Time Profile
  2. A Document Consumer within the XDS.b Cross Enterprise Document Sharing Profile
  3. A Document Consumer within the Multi-Patient Stored Query Profile, supporting the Asynchronous Web Services Exchange Option.
  4. A Content Consumer within the BPPC Profile

What does that mean? Diagrammatically it can be shown as:

Actors

The Time Client within Consistent Time Profle refers to the NTP RFCs for 95% of the time functions and transactions. All of these are included by reference. The IHE Time Client adds the requirement to the glue and other logic that whenever current time is used, the time from NTP must be used. It also adds the requirement that the system be configurable to be synchronized to with 1 second (at worst). It limits the requirement to include only those NTP functions that a client will need. The Time Client need not act as an NTP server. Implementing this requirement is usually no harder than turning on and configuring the inherent system NTP services.

The Document Consumer part is more complex. All Document Consumers must meet a variety of functional requirements for dealing with documents. These include functions around privacy and security, as well as interfaces to a variety of communications, local storage, and system services. The Product features usually depend upon the use of these services.

The Document Consumer within the XDS.b Cross Enterprise Document Sharing Profile has a long list of functional and transaction requirements. These define the external behavior of the product exposed by the document consumer.

The Document Consumer within the Multi-Patient Stored Query Profile adds some query capabilities and some privacy and security requirements. In theory a Product could exist that simply does these queries without ever retrieving the documents using the other Document Consumer actor, but that is unlikely. The profile assumes that the appropriate product glue exists to use the query results to drive document retrieval. IHE does not specify how the two Document Consumer actors communicate, nor does the Product design need to implement them as separate actors in the code.

This Document Consumer also supports the Asynchronous Web Services Exchange Option. This makes it a kind of Document Consumer within the Multi-Patient Stored Query Profile. It complies with all those rules. In addition, it complies with the rules of the Asynchronous Web Services Exchange option. This corresponds to an "inheritance" relationship in object oriented terms. Note, in this artificial example I did not include the Asynchronous object on the other Document Consumer. That Document Consumer does not have the extra capabilities. This is not a realistic choice by a Product designer. A real product designer would be more consistent and add that option to all of the Document Consumer actors in their product.

The Content Consumer within BPPC Profile adds internal functions without adding any external transactions. It is assumed that there exists glue to attach it to some sort of communications transactions, and in this case those communications functions are those provided by the two Document Consumer mixins. There are generic Content Consumer functions dealing with access control, internal storage, etc. These enable whatever the other Product features are.

The "within the BPPC Profile" adds the ability to process documents of a particular format. In this case it is the Basic Privacy Consent document. The profile also includes functional requirements that other parts of the Product will comply with the restrictions found in the consent documents. (That generates a lot of product specific requirements on the glue logic and product features.) There is also the requirement that a Content Consumer in the BPPC profile shall be grouped with one of three different IHE actors. In this example the Document Consumer within the XDS profile is used. This requires that at a minimum, the Product can use the Document Consumer functions to obtain BPPC format documents.

3. Issues

Many people are sloppy with their terminology when speaking or writing. They assume a context rather than using the sometimes very long phrases needed to be precise. They may say that they are making a Document Consumer actor. This will cause confusion in the absence of a mutually agreed context. They could mean that they are developing a library or set of mixins for a particular language and interface. This library will be incorporated into some other Product to provide the Document Consumer functions in conformance with the rules of some integration profile. Or, they could mean that they are developing a complete product, and that this product will incorporate the Document Consumer functions and support the Document Consumer transactions in the context of some integration profile.

When the context is mutually understood you know which of these alternatives is meant, what profiles are involved, options involved, etc. Without a shared understanding of context, confusion results. This kind of confusion is unfortunately common in IHE discussions and documents. It’s easy to pick up a document and not realize the context.

Another problem is that we have not re-factored the system in many years. Some original understandings were wrong. New capabilities have been added. As with all long lived software systems the result is occasional awkward and confusing constructions.

For example, knowing what we know now, I would argue to re-factor the Document Consumer, Document Recipient, and Media Importer. It would be much clearer to have a basic set of core functions, which would in turn have mixins for XDS transactions, XDR transactions, and XDM transactions. It’s now clear that there is a common set of document management functional requirements that is independent of the communications transactions. These are presently in the Document Consumer, Document Recipient, and Media Importer. They would be moved to a new generic core actor. The transaction specific mixins would be defined. That would make addition of FHIR and other future transaction specific functionality much tidier. We would be able to specify "use the <new generic actor> plus XDS, XDR, and FHIR mixins" for example.

That might happen some day. Anyone who has been through the re-factoring of a large system knows that this is a lot of work, and it’s not likely to be undertaken until the pain of dealing with the present awkwardness becomes substantial.

July 18, 2014 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

FHIR and DICOM

Ewout Kramer's presentation on DICOM from a FHIR perspective is a good presentation of first impression issues to consider when dealing with images and FHIR.  Most important is his recognition of the differences between a normalized and composite (document) view of the world.

At SIIM the prespective will be that the PACS is at the center of the universe.  The more general DICOM perspective is:

  • The PACS is crucially important, but there will be more than one PACS involved.  The PACS centric view dominates the acquisition and early hours of activity, but then things change.
  • The PACS will share data with other PACS systems by network.  This is sometimes a partnership of equals, and sometimes a federation.  But it does mean that data exchange takes place between autonomous partners.
  • The PACS may share data with other PACS systems by media (e.g., DVD).  This is often overlooked in network centric discussions.  Over 300 million patient studies are exchanged annually on CD and DVD.  This may be after a significant time delay when the patient brings old studies to a new provider.  Media transfers are a significant use of DICOM. The effect on tertiary providers has been about a 20% reduction in procedures performed and reimbursements.
  • External workstations, outside reviewers, and other systems are a much lower volume, but exchanges with them are a crucial part of DICOM workflows.

The normalized versus composite (document) organization of data is a very important difference and it deserves all the emphasis it is given in the lecture.  It will need considerable discussion and thought.

DICOM has both normalized and composiste forms.  The decision to use composite for image objects is an old decision that was driven by many of the same considerations that led to the document model in CDA.

  • It matches the medical workflows that were well established by 100 years of medical practice in radiology at the time DICOM was introduced.  Matching the system design to the user workflow is a good idea.
  • It meets the need for autonomy of operation between independent PACS, workstation, and other systems.  Managing one normalized view over many independent systems is very hard.
  • It meets the clinical need to present the state of knowledge at the time of capture.  Medical decisions need access to both the state of knowledge at the time of examination and the current state of knowledge.  You can't compare an old with a new study without having both.

Missing from the video, and likely missing from the SIIM hackathon, are the implications of sheer size.  Medical images are more than just "large".  You cannot just say "use a faster machine and network" when a single complex medical study can exceed 1 gigabyte in size and a collection of studies must be usable on within the short time limits allowed by current medical practice workflows.   PACS systems expect to deal with data volumes measured in terabytes per day.  This must be designed in.

 

 

June 04, 2014 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

Analysis of Heartbleed and IHE ATNA effectiveness

 


Table of Contents

1. Nature of the Attack
2. The Risks
3. Responses
4. Commentary

1. Nature of the Attack

Heartbleed is a high profile flaw and attack on the OpenTLS implementation. This analyzes how well IHE ATNA rules mitigated the flaw, what should change, and contemplates future sensitivities. The flaw permitted exposure of in memory contents of client and server to a malicious counterparty on an established connection. The exposure could then potentially expose current, past, and perhaps future encrypted traffic.

The following analysis applies only to systems that use or used the vulnerable versions of software.  Systems that used different software, or non-vulnerable versions, will not be directly affected by this bug.

The assets that ATNA needs to protect are:

The encrypted traffic
This could be past, present and future traffic. Most especially, this is the documents being exchanged.
Private Authentication data
This could be private certificates, passwords, etc.
Both may be exposed while memory resident in the server, so both are at risk.  The probability of exposure depends upon both static hardware and software characterisitics, and on bynamic history of activity.  It is data that was in memory that is at risk.  That data could expose either.
 
Significant correction.  Examination of the attack/test code shows that OpenSSL does not require an initial negotiation before turning on the heartbeat support.  So the IHE ATNA bi-directional authentication does not provide protection.

2. The Risks

The risks to ATNA protected systems are the same as for any server.  The analysis below is wrong.  It was based on incorrect information that initial TLS negotiation had to succeed.

ATNA requires the use of bi-directional authentication. So ATNA connections are exposed to this attack when one or both sides of a connection are malicious. The organization need not be malicious, but the secure node must have been penetrated and made malicious.

Other, non-ATNA, connections to the same server also expose ATNA connections to this attack. ATNA requires "appropriate" security measures for secure nodes and for secure applications. This is open to interpretation by those deploying systems. If the same secure node was used for both ATNA connections and non-ATNA connections, then those other connections may have exposed private data.

Most public servers use only server-authentication. They do not authenticate the client. They will accept connections from any client. It is near certainty that a public server will be penetrated by heartbleed given the number of potential malicious clients.

A server that requires bi-directional authentication for all connections has a lower probability of penetration. It drops to the probability that one of the known systems is penetrated and malicious. ATNA assumes that validation of acceptability for authentication is based on a review of security practices, so these systems are better protected than the typical system.

This makes the odds of penetration by heartbleed lower. For a system with only a few well protected partners, it is quite low. For a system supporting a large number of parters, it is higher, but probably still much lower than a public server.

Significant risk factors external to ATNA are:

  • Was the system used exclusively for bi-directionally authenticated transactions? Even one open public https port could expose the system to attack. ATNA services on a shared server were probably exposed.  (Details of memory access are very implementation dependent, so exposure is hard to predict.)
  • Are other connections protected by other means from general public access? VPNs and VLANs are common alternative protections.
  • Was traffic recordable for later attack? This is highly dependent upon implementation details. It changes the scope of the assets at risk in the event that the system was penetrated.

3. Responses

Some responses are obvious:

  • Update to remove the bug. (Don’t waste time reading this. Do that now.) Get patches distributed and installed.
  • Consider replacing passwords, revoking and replacing certificates. The urgency of this is very much dependent upon the number and type of potential communication partners. A system with public access was almost certainly penetrated and information that was available in memory is very much at risk. A system with just a few known well protected partners is at much lower risk.
  • Consider negotiating Forward Security. This was always permitted by ATNA as part of TLS negotiation, but support is not required. Some systems were doing this already, because most of the libraries that support ATNA also support Forward Security. If offered and supported, it would be used. Forward Security did not protect against this bug. It just reduces the amount of past and future network traffic that was exposed.
  • Consider partitioning systems so that public facing systems are fully separated (at the hardware level) from internal facing systems. In this context, public facing means systems that accept connections from any client. Many organizations use VPNs, TLS, SSH, etc. with configurations that require bi-directional authentication at all times. That is not public facing. Those systems deny network connections to unknown clients. (This authentication must be at the connection level, not as a later password or token interaction.)

Internal facing does not ensure safety. Depending upon the nature of your internal systems the risk of internal malicious systems ranges from low to high. There are many ways that malicious software gets into internal systems. The difference between internal and public facing is probability. It is certain that public facing systems are subject to constant attack from thousands of systems using a wide variety of methods. Internal facing systems are subject to attack from a much smaller number of systems using a smaller variety of methods.

4. Commentary

Some thoughts on IHE response:

  • Perhaps IHE should explicitly call out the permissibility of Forward Security. I suspect that many readers don’t realize that it is available as an option, since it is not listed. ATNA only lists the minimum necessary, not all the possible options.
  • Perhaps make some stronger statement, such as identifying Forward Security as an IHE option. This doesn’t protect against penetration, but it does reduce the exposure from a penetration.

Some thoughts on public security perceptions:

  • Five years ago it was hard to get the public interested in TLS protection, and tools like "HTTPS Everywhere" were limited to the techno-geeks. Now, a widespread flaw in TLS is major news. That’s quite a change for five years.
  • I expect some changes to authentication technologies:

    • New approaches that incorporate bi-drectional authentication in ordinary consumer transactions will spread. Right now they are very rare and often poorly implemented. Banks are starting to use them. But corporate VPNs are pushing the technology and education into the general public. The affect of Heartbleed would have been somewhat smaller if these were in use. Instead of having a certainty of penetration for all public facing servers, it would be the likelihood that one of the server’s customer/client systems had been penetrated. It would still be a major penetration, but that is a smaller risk.
    • One time password and related ID systems will spread. I rather like the Yubikey system. There are various others like it with different hardware and software requirements. They vary quite a bit at the moment, ranging from expensive smartcard ID systems like that used in some US government systems, to very simple systems like the Yubikey.
    • I expect bio-metric IDs to flower and die for public authentication. The problem is that bio-metric IDs can be stolen. (I’ve done device drivers for fingerprint scanners. I know how to steal and copy a fingerprint. It’s harder than stealing and copying a certificate, but it can be done. Unlike a certificate, you can’t revoke a fingerprint.)

April 13, 2014 in Current Affairs, Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

It's not a greenfield

At this week's WG-06 meeting we have one item that reflects the need to think about installed base and compatibility.  We've got a problem with an old basic component of almost every DICOM SOP class.  There is a fundamental element for encoding coded terminology.  Twenty years ago, when this element was defined, nobody expected these opaque code values to be longer than 16 characters.  These are code values like ICD-9, SNOMED, and LOINC.  Now, we've got coding systems using OIDs, UUIDs, and URIs for codes.  The latest version of SNOMED uses codes that are 18 characters long, and longer for local extensions.  So DICOM needs to find a fix.

There are several alternatives.  Each is an easy software change. Each has problems with failure modes.  We can't find a fix without failure modes.  Software that is unprepared for longer code values will have some form of failure when given an object with longer code values.

We spent about 45 minutes discussing this yesterday.  We spent the whole time looking at failure modes.  What will happen to existing implementations that receive new objects.  What will be all the failure modes.  What will be the recovery modes?  How will the errors propagate?  How will this affect archived objects?

We've picked an approach and will be writing up the recommended change so that toolkit implementors and others can evaluate the ramifications and comment on what we missed.  Reviewing the final form will take another 15 minutes.

This little change consumed an hour this week and will consume much more.  We dealt with the complete review of a new RESTful service for obtaining capabilities of archive services in less time.

Nobody thinks this is unreasonable.  DICOM is not a  greenfield.  You need to spend the time that it takes to avoid introducing failures.

April 03, 2014 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

Medical Archival

 

What is Archival


Table of Contents

1. Overview
2. Administrative and Regulatory Archival
3. Cost and Performance considerations
4. Business continuity
5. Data Management
6. Conclusion

1. Overview

What is archival in a medical records context? How, if at all, should standards address this issue?

Archival as a term can be mean different things to different people, and it is important to understand the motivations for archival. There can be multiple related motivations. In general they fall into overlapping categories:

  • Administrative and regulatory demands. For example, the Massachusetts official archivist is responsible to ensure that legal records are preserved indefinitely. The legislative and property ownership/transfer records presently go back 400 years, and the archivist has been actively considering what should be done now so that new records will remain usable 400 years from now. Medical records are not usually of interest for that long, but there are regulatory requirements for preservation of medical records.
  • Cost/Performance considerations. Medical systems, especially imaging systems, are very demanding data consumers. It is normal for a PACS system to utilize a mixture of storage techniques so that desired performance can be delivered at a lower cost. This routinely involves migration of data from more expensive faster storage to less expensive slower storage.
  • Business continuity considerations. Medical systems are just as subject to the risks of fire, flood, and other disasters as any other data processing systems. Considerations of off-site storage, storage reliability, etc. apply. Organizational changes, bankruptcies, and other disruptions can destroy data held in trust for patients.
  • Data management considerations. Medical records are subject to modification, both desired and undesired. Patient names, addresses, etc. can be incorrectly gathered when the record was collected and need changing. The need to control these changes, approving some and disallowing others, interacts with data storage in a variety of ways.

So when the statement is made "this data has been archived", you must know a lot more before you understand what was meant.

DICOM has so far stayed away from these issues, and merely nibbled at the edges. It provides one important piece of information that has so far been sufficient for operational use, and that has not been too controversial. In DICOM, you can obtain an estimate of how rapidly a document can be provided:

 
  • “ONLINE” means the instances are immediately available from the Retrieve AE Title (0008,0054), and if a C-MOVE were to be requested, it would succeed in a reasonably short time
  • “NEARLINE” means the instances need to be retrieved from relatively slow media such as optical disk or tape, and if a C-MOVE were to be requested from the Retrieve AE Title (0008,0054), it would succeed, but may take a considerable time
  • “OFFLINE” means that a manual intervention is needed before the instances may be retrieved, and if a C-MOVE were to be requested from the Retrieve AE Title (0008,0054), it would fail (e.g., by timeout) without such manual intervention.
  • “UNAVAILABLE” means the instances cannot be retrieved from the Retrieve AE Title (0008,0054), and if a C-MOVE were to be requested, it would fail. Note that SOP Instances that are unavailable from this AE may be available from other AEs, or may have an alternate representation that is available from this AE.
 
  -- DICOM Standard PS3.3 Section C.4.23.1.1

The many different issues that motivate archival may explain DICOM’s choice to avoid attempts to standardize this highly complex area.

2. Administrative and Regulatory Archival

Most of the regulatory requirements are phrased quite simply:

This data shall be preserved for X years.

The rest is left as a system design problem. The issues that are typically noted by archivists include:

  • Chemical/Internal degradation. Many archivists require acid-free paper and non-corrosive inks. Cheap paper and ink documents will self-destruct in a few decades.
  • Environmental degradation. Corrosion, damp, heat, light, and other environmental factors can damage documents. Old magnetic media delaminates as the plasticizers evaporate from the backing. Fungus and insects may damage the media.
  • Technology change. The machines needed to process the media may become unavailable. One of the issues that concerned the Massachusetts archivist was transferring legal documents off 8-inch floppies for use on a Wang word processing machine before the last of those machines became unservicable. NASA has lost old space records because the machines needed to read the 7-track tapes were no longer functional.
  • Proprietary formats. This is a variation on the technology change. Even if the media and machines may exist, the old software or old formats might no longer be supported or available.
  • Loss of supporting infrastructure. Technologies like DRM introduce necessary supporting infrastructure. If mandatory servers are shut down, or old versions not supported, documents may become unavailable. Adobe is shutting down some of their old DRM alternatives this year. Owners of book readers may find all their books unavailable when the DRM servers are shut down.

All of these are potential problems for medical records. This is one of the motivations for development of openly available standards. The standards can’t remove the degradation or environmental risks, but they can reduce the technological, proprietary format, and supporting infrastructure risks.

3. Cost and Performance considerations

Cost and performance take several dimensions. The most common considerations are:

  • Equipment cost vs speed.
  • Electrical and operational cost vs capacity.
  • Facility cost vs capacity

The levels of storage used in PACS systems illustrate the kind of trade offs needed for equipment cost versus speed. The goal here is to provide the fast response demanded by radiologists, while keeping costs down. You find a storage hierarchy results:

  • RAM storage is used for the current and next few studies. There is typically 16-64 GB of RAM set aside for this in a radiology workstation.
  • FAST RAID storage is used for current disk storage. A single ordinary disk drive is too slow.
  • Local PACS storage is used for data expected to be needed soon.
  • A central PACS storage is used for data that can tolerate some delays due to network transfer times.
  • A multi-site PACS storage structure may be used and associated with patient movements and physician locations.
  • A private cloud storage system may be used for bulk slower storage.
  • A robotic tape, DVD, or Blu-ray archive may be used for bulk storage.

As the data volumes increase and the tolerance for delays increase, the private cloud and robotic systems come to dominate. Issues like electricity consumption and building volume requirements start to become an issue.

4. Business continuity

Physical protection is an issue, with consideration for floods, fire, and other disasters quite important. Multi-location storage for disaster survival becomes a consideration. Often the people will move to escape the disaster, raising the issue that their medical records must still be accessible. Since site recovery takes much longer than people relocation, having multiple storage locations becomes important.

Other issues like bankruptcy and mergers also place records at risk. Third party archives are one approach to managing these issues. So not only are there multiple location considerations, there are multiple organization considerations.

5. Data Management

With all this data at multiple locations and under the control of multiple organizations, archival also introduces the considerations of data management. How are appropriate corrections made? How are unauthorized modifications prevented? How are incorrect changes corrected? Medical records systems are just beginning to consider these issues.

The data clouds and in particular the open source communities are starting to combine the needs of data management and business continuity to reduce costs and provide better service.

The Linux system is maintained on a highly distributed change management system using "git". Linus Torvalds has said that he no longer does backups of any of his work. Instead, when a disk fails, he just goes out to the distributed change management system and recovers from that. Since he commits his changes into the distributed system whenever he finishes each small task, the most he risks losing is the task that he was in the middle of doing when the failure occurred.

Less dramatically, the PACS systems supporting the UK NHS use multiple PACS central servers with fully redundant copies of data. If a system goes down, the others continue operation and update the down system when it is restored. The redundancy is not as fine grained as the Linux example. If a central server is completely cut off from all the other central servers, and then it is destroyed (perhaps by fire), the data gathered between the loss of connectivity and the fire would be lost. But that’s a deliberate risk assessment decision. They would rather continue medical operations and accept the risk that a subsequent event destroys the PACS system. It’s a low likelihood risk given the multiple network connectivity provided.

6. Conclusion

All of the very different meanings discussed above are called archival by different people. Some of those meanings clearly use a graduated scale of archival, not a binary event. Even the very limited operational bit standardized by DICOM had to create four categories of data, not the binary archived vs not archived.

Efforts to provide further standardization around archival must be done with a consideration of the very different uses for archival by different people. The use cases being standardized and the terminology used needs to be very carefully defined.

February 09, 2014 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

Asciidoc - some meta-meta-analysis

The previous post on Confidentiality Codes is my first use of asciidoc to prepare multi-targeted material.   I think it came out rather nicely. 

I prepared it in asciidoc format, and then processed it into XHTML.  I then pasted it into the edit body for posting.  The first attempt included some title and change tracking sections that did not look good for a blog.  They were just a few <div> sections in the HTML.  I removed them and like the result.

I processed it into PDF.  Again, I think the title and change tracking sections look awful.  The table of contents is OK but clunky.  So I need to look into more detail about sections and templates for asciidoc to figure out how to remove them.  Unlike HTML, it's not easy to just fix the resulting PDF.

I processed it into an EPUB.  This looks pretty good using both calibre reader (on Linux and on Windows), and with the Nook reader on Android.  The title and change tracking are not bad in those displays.

There is no way to create MS-Word documents directly, but that's no surprise.  Open source folks are hostile towards Microsoft, and the MS-Word formats are both proprietary and protected by patents.

I have confirmed that the asciidoc toolset works on both my Linux and Windows systems.  It might work on Mac OS.  The hard part will be the dependencies on Python and LaTeX.  Setting those up on a Mac might be hard.  On Windows I used cygwin and everything just worked.

December 06, 2013 in Standards, Web/Tech | Permalink | Comments (0) | TrackBack (0)

Confidentiality Code Use Cases

Confidentiality Use Cases


Table of Contents

1. The problem
1.1. The Use Cases
2. Use Case A, New Mexico destination
2.1. What information needs to be exchanged?
2.2. Receiving system behavior
2.3. Issue
3. Use Case B - A Massachusetts destination
3.1. Expected system processing
4. Use Case C - A Toronto destination
5. Use Case D - A Delft destination

1. The problem

There is presently work underway defining methodology for conveying confidentiality codes and obligations electronically for medical records. These use cases illustrate real world situations that motivate this work.

The distinction between confidentiality requirements and obligations is acknowledged to be vague and somewhat arbitrary. For the purpose of the use cases all potential categories will be called confidentiality. A typical problem with categorization is the Massachusetts law stating that psychiatric records may not be transferred without a specific written authorization signed by the patient or guardian. That’s a direct legal requirement. Does that result in those records being given a confidentiality class or obligation class?

All these requirements also apply to paper records and must be handled by manual efforts. So in theory, electronic transfer could simply replace the paper shipment and all of the manual confidentiality efforts continue.

This is not practical. The manual efforts are so burdensome and insecure that most practitioners choose not to exchange records. Using a plain electronic transfer would greatly increase the insecurity. It’s much easier to copy and send electronic records than it is paper records. Without electronic assistance managing the confidentiality process the practitioners would be justified in using ethical and legal arguments to refuse all electronic transfers of records.

These use cases are informed by current practices and law suits over failures of the current manual systems. For a useful current reference see http://www.ncbi.nlm.nih.gov/books/NBK19829/.

1.1. The Use Cases

In all of the use cases there is a common problem to be solved. A psychiatrist not in a Federal agency wishes to send medical records about a patient to another psychiatrist not in a Federal agency. The legal requirements when Federal agencies are involved are presently in dispute and the court cases are active. These use cases intentionally avoid those cases only to avoid the uncertainty in legal requirements.

The present paper situation requires the recipient practitioner to figure out what confidentiality requirements apply to the received documents. This is a signification natural language processing and legal evaluation problem. The human psychiatric staff has the training and skills to do this. It is not practical to expect the receiving computer system to have that level of natural language processing and AI legal skills.

The transmitting computer system normally has internal confidentiality codes for all documents. This is a required component for medical systems. It does not require either natural language processing or AI skills for the sending system to convey the local confidentiality codes. The use cases show how the combination of the sending system providing it’s local codes together with reasonable processing on the receiving system the receiver’s confidentiality problems can be solved without needing natural language processing or legal AI systems.

These use cases all start with a New Mexico psychiatrist sending electronic psychiatric records to another psychiatrist. The four use cases differ in the destination for the medical records:

Use Case A
A New Mexico destination. This is the most common case and illustrates a situation where both the sender and receiver have the same confidentiality regulations.
Use Case B
A Massachusetts destination. This is less common and illustrates crossing some, but not all, jurisdictional boundaries. In this case the state regulations are different in New Mexico and Massachusetts, but the Federal regulations are the same.
Use Case C
A Toronto, Ontario, Canada destination. In this case all jurisdictional boundaries have been crossed. But there is enough cross border activity that it is likely that the receiving system is aware of the US Federal regulations, although likely unaware of the state regulations.
Use Case D
A Delft, Netherlands destination. In this case all jurisdictional boundaries have been crossed, and the receiving system is unlikely to understand any of the sending system’s requirements.

2. Use Case A, New Mexico destination

This has the simplest legal situation. This transfer is subject to the following requirements:

45CFR164

This is also known as HIPAA. It applies to all medical records.

45CFR164.501

There are special HIPAA requirements that apply to only psychiatric records.

42CFR2

This regulation applies to substance abuse records. Substance abuse is a common co-morbidity with psychiatric issues, so this often applies.

32A-6A-24(H)NMSA1978

New Mexico has a law that applies to all psychiatric and substance abuse related records. This is the resulting regulation.

In addition, 42 USC 290dd-2 applies at least in part. This US law states that when state regulations are stricter than the Federal HIPAA regulations, the state regulations shall override the Federal regulation. This leads to legal complexities and problems with interpretation, which is why these use cases avoid involving US Federal agencies. There are active court cases ongoing to deal with these legal issues. See http://www.phiprivacy.net/papen-and-morales-call-for-patient-information-security-after-behavioral-health-audit/.

2.1. What information needs to be exchanged?

As mentioned above, the sending side could indicate nothing and make this a substantial natural language processing and legal AI problem.

I suggest that the sending system would attach the list of applicable regulations as metadata about the records. It is reasonable for it to attach all the regulations that the sending system knows apply. So the sending system would attach metadata indicating that the following apply:

  1. 45CFR164
  2. 45CFR164.501
  3. 42CFR2
  4. 32A-6A-24(H)NMSA1978

This list includes both 45CFR164 and 45CFR164.501 because this avoids an AI problem for the receiving side. An AI system might be aware that 45CFR164.501 implies that 45CFR164 also applies. There is no simple general case rule for regulatory implications. Only an AI-class system with a complete awareness of regulations could handle all the real world complexity. It’s easy for the sending system to include both.

2.2. Receiving system behavior

This situation is easy. The receiving system is also in New Mexico and it will recognize all of the confidentiality codes. It can simply apply the appropriate internal tags to manage these records.

2.3. Issue

The New Mexico regulation is stricter than 45 CFR 164.501 regarding authorization and transmission requirements. The code 32A-6A-24(H)NMSA1978 has two merged meanings. It means both that the sending system is asserting that it has met the legal requirements for authorizing transmission, and informing the receiving system that the receiving system must attach those requirements to these documents.

I think that this is reasonable, since a properly performing sending system will not send any 32A-6A-24(H)NMSA1978 records without proper authorization. Splitting this information into two codes has no apparent value.

3. Use Case B - A Massachusetts destination

When the recipient is a Massachusetts psychiatrist the following laws and regulations show up:

Mass. G.L. chap. 214 sec 1B
This Massachusetts law applies to all business and social interactions. It makes a privacy violation that causes harm into a tort. There are no regulations to implement this. It’s only handled through tort claims and trials. See for example: http://privacylaw.proskauer.com/2013/06/articles/electronic-communications/massachusetts-jury-finds-violation-of-stored-communications-act-and-massachusetts-privacy-laws/
Mass. 201 CMR 17.00
This regulation implements Massachusetts General Law 93H. It imposes a variety of regulations on all business relationships involving personal information. Medical records are subject to this rule, and this rule is stricter than HIPAA.
Mass. G.L. chap 112 sec 129A
This law and regulations applies to all psychiatric records. It requires that all transfers have a specific written and signed authorization by either patient or guardian. It has no exceptions for treatment, etc., so this is stricter than 45 CFR 164.501.

Dealing with incoming psychiatric records can inspire software engineers to run down a rathole of expensive and complicated software solutions. The psychiatrists typically take a simpler less costly approach:

  • All direct electronic transmissions are prohibited, and prevented or rejected.
  • All records are transferred by courier on media. The media might be paper, USB, or CD. The media must be accompanied by the signed written authorization document. If this document is missing the media will not be processed and a breach report is usually generated. (In this case of a New Mexico source a human judgement will likely be made whether a breach report is appropriate.) Similarly, MA 201 CMR 17.00 requires that electronic media be encrypted. If it is not encrypted, a breach report is generated.

3.1. Expected system processing

I expect the following processing steps by a receiving system:

  1. There will be a user interaction with the operator to ask for the decryption information (e.g., password) and to ask whether there is a psychiatric records transfer authorization along with the media. It’s possible that there might be only non-psychiatric records on media, so the answer might be that there is no psychiatric authorization attached.
  2. It will start processing the metadata for the records.

    • It sees the 45CFR164 tag, understands it, and tags the records internally to reflect this. 45CFR164 applies and is understood in Massachusetts. It also adds the tag MA201CMR17.00, because that automatically applies.
    • It sees the 45CFR164.501 tag, understands it, and checks whether the operator said that a psychiatric authorization accompanied the media. If there was no authorization, alarms go off. I don’t know what the processes will be for out of state transfers without authorization. The receiving psychiatrist will have procedures for what to do in this case. It also adds the tag MA.GL112s129A, because that applies.
    • If 42CFR2 is present, it understands the tag, and tags the records internally to reflect this.
    • It sees 32A-6A-24(H)NMSA1978 and a different alarm goes off. A MA system will probably ask the operator: "what should be done with 32A-6A-24(H)NMSA1978?" At least the following possibilities exist:

      1. It’s OK, just copy that tag into the internal system
      2. It’s redundant, remove the tag
      3. Replace it with this other code that a Massachusetts system will understand.
      4. Keep it and add this other code.
      5. Keep it and mark this record for later review. A human will have to research the tag and decide what fix is appropriate.
      6. Reject this document. A human will have to deal with the issue.
      7. Reject this entire transmission. A human will have to deal with this issue.

4. Use Case C - A Toronto destination

The recipient in Toronto, ON, Canada will have a process similar to that of the Massachusetts recipient. In this use case it’s assumed that the Toronto system has enough interactions with US patients to be aware of the US Federal regulations. It’s much less likely to be aware of New Mexico state regulations.

So the process is like the Massachusetts process with the exception of the processing of the US Federal regulation tags. The Toronto system might copy them along for informational purposes, but in addition it will add the codes for Ontario provincial and Canadian national regulations that apply to medical records and psychiatric records. It probably recognizes the CFR tags and can do this automatically.

5. Use Case D - A Delft destination

A psychiatrist in Delft, Netherlands might get the psychiatric records from a New Mexico psychiatrist, but this will be a rare event. The Delft system is unlikely to recognize any of the confidentiality codes automatically.

This leads to the conclusion that in addition to the very specific regulatory codes the generic codes from HL7 (or some other international source) should be attached. An HL7 code of PSY conveys much less information about expectations than 45CFR164.501, 32A-6A-24(H)NMSA1978, MA201CMR17.00, or MA.GL112s129A. But it provides a good hint to the human who is dealing with the issue. Google searches will quickly reveal the regulatory intent.

The laws of the US, Massachusetts, and New Mexico do not apply in Delft, but there are Dutch laws that are applicable to psychiatric records. The generic tag can inform a human where to look for more details. They may look up the other tags to see whether they provide more information, and they may have to read the medical records to determine whether there are other Dutch regulations that will apply to the records.

At worst, without the generic tag, the problem is the same as the present situation when a paper document arrives. A human has to deal with the natural language and legal expertise problem. The generic tag and sending system tags can simplify the problem for the human who has to deal with this situation.

 

Edit: Fixed typo in MA law for psychiatric records.

December 06, 2013 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

SDO Scope Problem

I spent some time on Monday watching another SDO group struggle with determining scope for a project.  I have some doubts about the outcome, but getting this right is both very important and very hard.  It's something that takes training, experience, and the willingness to learn from other SDO efforts.  The training I mean is training in systems engineering, not training in the processes of this or that SDO.  Almost all the scope problems can be traced in part to failures of system engineering.

There is one rule of scope that is crucial and very hard to accomplish:

The scope of the standard should be the smallest possible scope that still solves the problem.  If this means splitting into multiple standards, split the problem and solve each one independently.  This means eliminating all, really all, of the desireable goals.  Include only the absolutely necessary.  But don't leave out anything crucial.

To determine this scope you need to understand the problem and the context.  You don't design in isolation.  The awareness of the context and all its problems presents a powerful motivation to expand the scope of the standard. It's very hard to see obvious problems crying out for solutions and cut them out of the scope. 

System engineering training can help.  Systems engineering is the branch of engineering that concentrates on taking complex systems, dividing them into pieces, and understanding the relationships between those pieces.  This is crucial to finding the dividing lines that allow the standard to both solve one problem completely and fit into a system with many other solutions for other problems.

One example of a success in controlling scope is the 10baseT ethernet standard.  It defined everything from the connector through electrical signals and timings for 10 Mbit/s ethernet.  Then it stopped.  It did not cover any other aspect of networking, neither the wiring nor the software. This allowed 10baseT to evolve and co-exist with other standards successfully.  The router, hub, and other wiring could evolve with those technologies.  The networking could be SNA, OSI, TCP/IP, PPPOE, and other networking methods.  The host computer could be a mini-computer, and introduction of PC desktops, laptops, etc. did not cause problems.  The care in designing 10baseT allowed ethernet evolution to add 100baseT and 1000baseT without forcing mass replacements.

That's because 10BaseT chose it's scope well.  It did the ethernet job without failure or compromise.  It did not interfere with other solutions to other parts of the problem by attempting to solve even a small part of that other problem.

An example of a failure is the RS-322/323 specification.  Few people have heard of this because it was a failure.  It was to be a replacement for the RS-232 serial interface between computer and modem.  It was to solve problems with cable length restrictions, signalling speed, and new modem features.

RS-322/323 introduced a huge 37-pin connector, with the implied 37 wire cable between modem and computer.  It had pins and wires defined for controlling all the new modem features.  It had new electrical signalling methods for better noise rejection and allowing longer cables.  It could run up to 2 Mbits/sec, instead of quitting at 64 Kbps.  It had massive government support.  For a while, the US government was mandating RS-322/323 connections on all purchased modems.  (It was like a mini-ONC for computers and modems.)

It was a total failure.

The computer industry invented it's own, unofficial, non-SDO standard.  The signalling part of RS-322/323 was good, because it solved the speed and distance problem.  The 37-pin D connector was much too big and 37-wire cables way too difficult.  One problem with RS-232 connectors was that they were already too large, not too small.  The industry considered the core problem and picked a 9-pin connector.  That was enough for all the data and data control signals.  All those other nice to have wires for this and that modem feature were eliminated.  Modem controls moved out of hardware and into the data stream, with controls like the "Hayes" modem controls.  The 9-pin serial connector could be found on minicomputers, desktop PCs, and laptops for a couple decades.  It's faded away along with the modem[

The SDO problem was scope.  All those nice to have modem controls should not have been in scope.  They were not crucial to solving the core problem of moving data.  But the SDO team could not abandon the important problem of controlling the modem.  They could not accept that some other standard should control the modem.  They were the modem experts. They did an excellent job, but having chosen the wrong scope they ended up with a failure.  The good quality of their work shows in the decision by industry to steal all of the electrical signalling work and all of the work on the data connections.  Industry stripped RS-322/323 down to the smaller scope and simpler standard that it should have been.

July 25, 2012 in Standards | Permalink | Comments (0) | TrackBack (0)

The question "document or not" considered harmful

The question "Is this data a document" is unfortunately a harmful question.  It implies that the question can be answered, and that getting that answer will be beneficial.  I think that it cannot be answered, and that a search for an answer as phrased will in fact be harmful.

The proper question is "what is the best form for my intended use of this data, and how do I get it into this form without destroying its utility for other purposes?"  This does not mean that data can be in only one form.  In fact, efforts to do that are destined to fail.  Data will be in many forms for many purposes, and there will be extensive metadata usage to bridge these diferences.  This metadata exists to allow the data formats to be tailored to particular purposes without destroying utility for other purposes.

I'll explain by examples from my experiences with environmental work.  Weather data is captured by data loggers, such as stripcharts, and by observers in logbooks.  The instant continuous readings are displayed and may be used in real time, but they are lost.  Only the strip charts and logbooks survive.  Of course there are also calibration records, perhaps in the same logbooks, perhaps in different logbooks.  There are attendence records for observers.  There are credit card records for travel.  (Why are credit card records relevant?  There is falsification of logbooks, often discovered by a suspicious data analyst checking observer attendence and travel records.  It was a common practice to take unauthorized absences and fill the log books retrospectively to conceal this.)

Are all these documents?  Yes, or maybe not.

There are also petabytes, exabytes, and whatever is bigger than that of sounding balloon data, radar data, microwave soundings, radio limbing records, satellite images, etc.

Are all these documents?  Yes, or maybe not.

For ground water pollution you must add field notebooks, photographs, samples, cores, photomicrographs, lab notebooks, equipment calibration records, purchase records for supplies, standards tracability records, ......  

Are all these documents?  Yes, or maybe not.

The weather data must be gridded for analysis, as must geological data.  Suddenly all those documents are abstracted into gridded data.  Is this another document?  You also must document the gridding process.  Gridding will reveal questionable data (like falsified observer records).  This must be examined. Perhaps the original logbooks must be examined.  When done, you've got corrected data, audit reports, correction reports, traceability. 

You now wonder about the sensitivity of the data analysis to those errors.  So you've got eigenfunction analyses.  While you're at it, you perform eigenvector analysis of the data.  This can reveal both more erroneous data and give insights into the physical processes being observed.  For operational work you must grid and analyze at regular intervals.  Most of this error analysis becomes metadata that is incorporated into the routine gridding and analysis process as procedural changes.

Are all these documents?  Yes, or maybe not.

A gridded temperature dataset can't have more than just a number, like 18, in a grid cell.  That's all the computer models can deal with.  If you make that data any more complex, even just by adding some ancillary pointers, the computer models will be unable to run fast enough to deliver useful answers when they are needed.  So all of the relationships between the grid and the tens of thousands of source elements has to live in metadata.  If you do not define that metadata separately, the computer models will not be able to run fast enough.  If you do not capture that metadata, you lose the ability to analyze for errors or to make any process improvements or to make any retrospective analyses for other purposes.  You need to maintain the gridded data, the metadata, the orginial records, intermediate derived records, audit records, and calibrations records.  But these must be maintained in the appropriate forms and appropriate locations for their intended uses. 

Are all these documents?  Yes, or maybe not.

The operational gridding may need to be performed every 5 minutes, or hourly, or daily, etc.  The processes are updated at a much lower rate.  Errors will be corrected at a slower rate and incorporated into the operational process.  Researchers may need to perform retrospective analysis to incorporate error correction into their research databases, but the original analyses must be preserved so that post-facto operational reviews can be based on the data that was available to the people at the time they made decisions.

The question of document is irrelevant and distracts from the real issue.

The proper question is "what is the best form for my intended use of this data, and how do I get it in this form without destroying its utility for other purposes?".  Asking whether this is a document takes you down the wrong path, into the wrong questions, and the wrong arguments.

I've used examples from environmental work, but all of these issues apply equally well to medical records.

July 12, 2012 in Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

On LinkedIn Passwords

The LinkedIn password disclosure might not have also released account names.  We went over it at a security lunch today.  If they used a system similar to Radius servers, there are two separate databases, one that maps username to account number and one that maps account number to password hash.  It is plausible that LinkedIn used this structure for the same reasons that Radius does.  It improves performance in some respects and reduces the harm from partial breaches of security.

I had not considered it likely that LinkIn would do this, given their silence on security methods and the available information on their database breach.  But copying the Radius approach (or perhaps using a Radius or Radius derived system) is plausible to me.

June 12, 2012 in Current Affairs, Standards, Web/Tech | Permalink | Comments (0) | TrackBack (0)

»