Summary:The Dublin Core work leaves out the importance of establishing an intended use as context for metadata. Having this context then makes their level of interoperability and some of the issues around metadata storage much clearer.
Dublin Core leaves out the importance of intended use when discussing metadata. It may be too obvious to those close to the problem. Their definition
"Metadata is data about data>"
while correct, is insufficient. All data is metadata from some context. A clearer definition is:
"Metadata is data about data, that is useful in a specific context of intended use."
Johm Moehrke's post gives good examples of the kinds of intended use that are important for medical records.
It makes sense to say that PatientID is metadata about a document in different contexts:
- It could mean that "This document is about PatientID"
- It could mean that "This document references PatientID", e.g., a document about a child references the mother.
You need the context of a use to understand metadata.
The context of use also explains the levels of interoperability that are otherwise left dangling by the Dublin Core. The degree of interoperability is in the context of the intended use. An example of the lowest level of interoperability might be a piece of metadata called "license".
At the lowest level, that word "license" is all you know about the metadata. You can only guess about possible meanings. You don't know the format of "license". Maybe it is a text blob that contains legal language. Maybe it's a URL to a document in an unknown format. Maybe it's a UUID. This is the lowest level of interoperability and it makes automated processing nearly impossible. But, it's an important improvement over having nothing. There are many situations where this vague hint is sufficient information for a person to figure out what to do.
At the highest level, you find something like "diagnosticCode", with a specification that it is to be encoded as an HL7 CWE, with a value selected from the 2011 XYZ profile value set. Now I have the semantic meaning, the format, the vocabulary, complete version information, and can perform extensive automatic processing.
It's important to separate the discussion of metadata, intended use, and degree of interoperabilty needed in early discussions defining metadata. They are different concepts.
Another issue that is not mentioned in Dublin Core is the decision of how metadata is stored and conveyed. This is an interface and exchange problem only. Within any processing system you don't need agreement with others about how any data is stored or conveyed. But metadata discussions do need to understand that when exchanging metadata there are three possible situations:
- The metadata may be embedded in the document, and not otherwise exposed. This means that it is only accessible to systems and people that understand the document format. An example of this could be "patient's mother" or "KVP setting". These are metadata for some rather specialized uses in genomics and procedure analysis. An indexing registry for medical records is unlikely to maintain these as a separately stored metadata index.
- The metadata might only be available as a separate item. The hash value for a document is almost never stored as part of the document. It's use is as a separate piece of metadata used by the privacy, security, and integrity systems.
- The metadata might be stored both as part of the document and as a separate item. PatientID is often stored both ways. When using patientID as part of finding and selecting documents, it is appropriate to have separate indices for many reasons. But when processing those documents, it is necessary to have that patientID information in context within the document. This does lead to some considerations about consistency rules when defining how the metadata is to be used, and that is normal.
Comments