Fairhaven, The River

About

Recent Posts

  • Aviation News
  • Proper Statistics
  • FDA Color Summit
  • Two book reviews
  • History repeats with sulfur pollution
  • Weather Forecasting and Healthcare Risk discussions
  • Software Risk (and the end of the world)
  • Chromebook C7 Experience (It's the right choice)
  • Email environment
  • Chromebook C7 Experience (Initial Review)
Subscribe to this blog's feed
Blog powered by TypePad

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012

Categories

  • Arts
  • Books
  • Current Affairs
  • Eco-policy
  • Energy Tech
  • Food and Drink
  • Gift Economy
  • Healthcare
  • Politics
  • Science
  • Standards
  • Travel
  • Web/Tech

Aviation News

Today's Aviation Week had three relevant articles.

  1. Another article on upgrading aircraft to use digital telemetry, GPS, etc.  This time it was discussion of the steps that Delta is taking with some of their aircraft.  Nothing new or radical is reported.  It's just another article how to upgrade at reduced cost.  Long term (assuming the FAA mis-managers don't completely ruin things) the digital upgrades should reduce fuel use, air pollution, noise pollution, and flight times.  Switching from voice to text messaging makes sense for all the routine flight control.  There are two pilots and messaging doesn't interfere with flying the way it does with driving.  It's faster and avoids the confusion over exact numbers that sometimes affects the voice controls.  It also carries a lot more messages per second over the limited radio channels.
  2. A report on prototype effort by British Airways, as part of joint effort with Solena Fuels and GreenSky London, to build a $500 million plant to convert waste biomass into fuels.  It's to be built in Tilbury and go operational in 2016.  The plant will take 565,000 metric tons of sorted municipal waste and generate 50,000 tons jet fuel, 50,000 tons diesel, 20,000 tons naptha, and 50 megawatts excess electrical power.

    The feedstock is dry sorted municipal waste. That means metals, glass, and other recyclables have been removed.  (Technically it's called refuse derived fuel - RDF).  Part of the deal is the attraction of using something that somebody else collects and pays you to take, as are airline commitments to reduce carbon impact.  These fuels count as bio-fuels with no carbon impact.

    It's a plasma torch gasification, so plastic, tires, etc. can be processed.  It uses the usual syngas F-T processes, with the latest chemical reactor and catalyst designs.

  3. An article about the complaints about latest idiocy on carbon tax for aviation.  The European Parliament has clearly said that they will collect tax on aviation travel outside Europe, has created a bureaucratic monster, and all the non-EU countries (US, Russia, China, etc.) have reacted with immense hostility. This choice abrogates promises to use ICAO international processes for aviation CO2 controls. One of the absurdities is that a charter airline that flies one 747 per week is considered de minimus carbon contributor and avoids most of the paperwork.  A business jet operator that flies one business trip per month is not considered de minimus and must follow the full bureaucratic procedures.

    The annual paperwork cost (filings, people, etc.) is estimated at $100K/yr.  For an airline this is just another piece of the regulatory burden.  For business jet operators this is a big extra cost.  There is a series of "free" allowances.  Again, even for small airlines the cost of filing and qualifying is justified by the value of the allowances.  For business jet operators, the filing cost exceeds the value of the allowances.

    The laws justification was CO2 emissions, but clearly the regulations are designed to eliminate private aviation and business jet operations in favor of commercial airlines.  Extending the reach outside Europe is a simple power and money grab by the EU Parliament.

May 22, 2013 in Current Affairs, Eco-policy, Travel | Permalink | Comments (0) | TrackBack (0)

Software Risk (and the end of the world)

When I was in college I made some book shelves from boards and cider blocks rescued from a dumpster.  In my first job after college, I worked with a guy who made custom crafted bookshelves that sold for tens of thousands of dollars.  We would occasionally discuss the difficulty that software has with user recognition of the difference between home built and custom crafted, and the difficulty that users have in deciding what level is appropriate for the job.

Now we have discussion of how bad excel spreadsheets made the financial crisis much worse, how this is due in part to home built vs custom crafted, and how custom crafted is no assurance of quality.

Everyone is still struggling with the problem.

February 26, 2013 in Current Affairs, Web/Tech | Permalink | Comments (0) | TrackBack (0)

Innumeracy

The New York Times illustrates the ongoing innumeracy of politicians, press and public.  They quote:

Income for the top fifth of American households rose by 1.6 percent last year, driven by even larger increases for the top 5 percent of households, said David Johnson, the Census Bureau official who presented the findings.

Much more accurate would have been the statement:

Every household with an income below $180,000 (the 95th percentile of income) saw their income decline.  All of the income increases were for that 5 percent above $180,000.  They saw a total increase of about 6.5%.  The Census data does not provide further detail for the income distribution within the top 5%.

This information is not in the press release, but it's immediately obvious from the data in Table A-2 in the report.  I'm not sure why there is a political motivation to obscure this fact.  These official quotes are reviewed and approved carefully.  The Census bureau staffers clearly understand statistics, so they know that the statement is misleading.  But the Times reporter didn't bother to even scan the report.  I guess difficult words like "Income Dispersion" are too complex for the innumerate.  It also calls for some basic grasp of terminology of statistics to realize that if the top quintle had a total income increase of 1.6 percent and that all levels from 95% down showed a decrease in income, that all of the income had to be in the top 5%.

I don't see a strong editorial reason to conceal the actual dollar value for the income levels, or to imply that the top fifth saw increases.  I think that the political press is just incapable of even the most basic statistical understandings.

The table also reveals something much more interesting and unexplained.  For a couple decades before 1998 you see increases in most or all of the median income levels.  Suddenly, from 1998 onward, all of the median income levels from 95% down show steady or declining income.  Something happened in 1998, but what?

September 16, 2012 in Current Affairs | Permalink | Comments (0) | TrackBack (0)

Stories and Fables - MIT Conference Video - a review

This is a 90 minute panel discussion on the use of story and fables.  I found it more interesting than most.  The items that I found particularly interesting were:

Keith Oatley's discussion of different aspects of stories:

  • they are simulations of the real world that the reader/listener can easily understand
  • the most effective are about human agency
  • they are much more effective at persuasion than the rhetorical approach of thesis and supporting evidence, with the good (or bad) effect that the results of persuasion vary with each reader and reflect the simulation much more strongly than the evidence.
  • the evidence is that extensive reading of a variety of fiction leads to a much better understanding of social behavior.  The bookworm is learning more than the social butterfly. 
  • "artistic merit" is correlated with effectiveness at persuasion and with social understanding.

Laura Amico's approach to HomicideWatch takes an interesting and fundamentally different approach to editorial selections.  Rather than pick the good stories, they cover every homicide without exception, gathering all the available facts and stories for all of them.  The editorial choice is selectivity about comments and contributions to eliminate the malicious and non-factual.  This universality and uniformity has contributed to an interesting and different form of news reporting.

The advocacy and critique of Sam Gregory of Witness was thoughtful.  This was more informative at the meta level of understanding political advocacy than the polemics that raged around the original video.

August 14, 2012 in Current Affairs | Permalink | Comments (0) | TrackBack (0)

On LinkedIn Passwords

The LinkedIn password disclosure might not have also released account names.  We went over it at a security lunch today.  If they used a system similar to Radius servers, there are two separate databases, one that maps username to account number and one that maps account number to password hash.  It is plausible that LinkedIn used this structure for the same reasons that Radius does.  It improves performance in some respects and reduces the harm from partial breaches of security.

I had not considered it likely that LinkIn would do this, given their silence on security methods and the available information on their database breach.  But copying the Radius approach (or perhaps using a Radius or Radius derived system) is plausible to me.

June 12, 2012 in Current Affairs, Standards, Web/Tech | Permalink | Comments (0) | TrackBack (0)

Audit and accountability (HIPAA) lecture

Anupam Datta of CMU gave a good lecture at Microsoft Research, available here.  Healthcare privacy (HIPAA), financial regulations (Sarbox), and others are shifting to the use of audit and accountability rather than access control methodology.  His lecture discusses this from the perspective of the external control system.

First, he agrees that this is the right way to handle the increasingly complex issues that must be accomodated.  Access control methodologies will founder and flail, either not delivering the desired privacy or not delivering the desired services.

He then discusses the theoretical basis for analyzing and using audit as a control method.  The really hard problem is dealing with the real world data gaps, and with the inherent uncertainty of interpreting events.  In his terms, you need an "oracle" to take the ambiguous situations and decide whether these events did or did not meet the regulatory requirements.  The dominant need for the "oracle" is to answer questions around purpose, intentions, and plans.  He explains the semantic issues involved.

This need for an "oracle" to answer otherwise unanswerable questions is one reason that access control methods will fail, while audit control can succeed.  In the post-facto audit analysis you can find ways to deal with the "oracle" problem.  In the real time access control situation, the lack of an oracle results in failures.

Then he gives a high level overview of how audit logs would be analyzed.  This is at the level of discussion of first order logic, not programming requirements.  He's the first person I've heard complimenting HIPAA in a long time.  He found the highly operational nature of HIPAA led to a clean analysis solution, other than the inherent need for an oracle.  He also gives a high level first order logic equation describing HIPAA.  It's just a short simple equation.  He's right, HIPAA is a clean set of requirements when viewed properly.

 

May 30, 2012 in Current Affairs, Healthcare | Permalink | Comments (0) | TrackBack (0)

Defining Metadata

Summary:The Dublin Core work leaves out the importance of establishing an intended use as context for metadata.  Having this context then makes their level of interoperability and some of the issues around metadata storage much clearer.

Dublin Core leaves out the importance of intended use when discussing metadata.  It may be too obvious to those close to the problem. Their definition
      "Metadata is data about data>"
while correct, is insufficient.  All data is metadata from some context.  A clearer definition is:
      "Metadata is data about data, that is useful in a specific context of intended use."

Johm Moehrke's post gives good examples of the kinds of intended use that are important for medical records.

It makes sense to say that PatientID is metadata about a document in different contexts:

  • It could mean that "This document is about PatientID"
  • It could mean that "This document references PatientID", e.g., a document about a child references the mother.

You need the context of a use to understand metadata.

The context of use also explains the levels of interoperability that are otherwise left dangling by the Dublin Core.  The degree of interoperability is in the context of the intended use.  An example of the lowest level of interoperability might be a piece of metadata called "license".

At the lowest level, that word "license" is all you know about the metadata.  You can only guess about possible meanings.  You don't know the format of "license".  Maybe it is a text blob that contains legal language.  Maybe it's a URL to a document in an unknown format.  Maybe it's a UUID.  This is the lowest level of interoperability and it makes automated processing nearly impossible. But, it's an important improvement over having nothing.  There are many situations where this vague hint is sufficient information for a person to figure out what to do.

At the highest level, you find something like "diagnosticCode", with a specification that it is to be encoded as an HL7 CWE, with a value selected from the 2011 XYZ profile value set.  Now I have the semantic meaning, the format, the vocabulary, complete version information, and can perform extensive automatic processing.

It's important to separate the discussion of metadata, intended use, and degree of interoperabilty needed in early discussions defining metadata.  They are different concepts.

Another issue that is not mentioned in Dublin Core is the decision of how metadata is stored and conveyed.  This is an interface and exchange problem only.  Within any processing system you don't need agreement with others about how any data is stored or conveyed.  But metadata discussions do need to understand that when exchanging metadata there are three possible situations:

  • The metadata may be embedded in the document, and not otherwise exposed.  This means that it is only accessible to systems and people that understand the document format.  An example of this could be "patient's mother" or "KVP setting".  These are metadata for some rather specialized uses in genomics and procedure analysis.  An indexing registry for medical records is unlikely to maintain these as a separately stored metadata index.
  • The metadata might only be available as a separate item.  The hash value for a document is almost never stored as part of the document.  It's use is as a separate piece of metadata used by the privacy, security, and integrity systems.
  • The metadata might be stored both as part of the document and as a separate item.  PatientID is often stored both ways.  When using patientID as part of finding and selecting documents, it is appropriate to have separate indices for many reasons.  But when processing those documents, it is necessary to have that patientID information in context within the document.  This does lead to some considerations about consistency rules when defining how the metadata is to be used, and that is normal.

 

May 17, 2012 in Current Affairs, Healthcare, Standards | Permalink | Comments (0) | TrackBack (0)

Definitions matter (median income statistics)

Summary: definitions matter much more than I expected.

There have been lots of public opinions about the change in median income in the US, and what it means for policies.  It turns out that the definition of median income matters much more than I expected.

This table shows the increase in percentage from 1979 to 2007, for those who want the answer up front:

Income Included Tax Unit Household Size Adjusted Tax Unit Adjusted Household
pre tax, pre-transfer 3.2 12.5 14.5 20.6
pre tax, post-transfer 6.0 15.2 17.0 23.6
post tax, post-transfer 9.5 20.2 25.0 29.3
post both, plus health insurance 18.2 27.3 22.0 36.7

The widely reported figure is the 3.2.  This is used to argue that there has been no improvement.  All the gains have gone to the top 1%.  The middle class is being hollowed out.

The different definitions make for a more nuanced answer, and reflect difficulties in getting data.

The different terms are:

  • Tax Unit is the tax filing unit.  This is what the IRS tax statistics report.
  •  Household is what you would expect.  It's all the people in the house.  So everyone in the household is combined.  This captures the effects of grandparents, parents, and children all being potential earners and sharing income and expenses.  It also captures unmarried couples, shared custody, etc.  The IRS statistics don't capture this, but the monthly Census survey does.
  • Size adjustment modifies the income using the same adjustment as is used for cost of living.   A family of four needs more income than a single person, but not four times more.
  • The kinds of income reflect regular wages/dividends, transfer payments like social security or food stamps, and finally health insurance benefits.  These variations also reflect data gathering.  The IRS can measure some transfer income, like the EITC, but not other transfer income, like food stamps.  EITC and food stamps are two very large social welfare programs in the US.

A recent paper is interesting in that it works from the census bureau data rather than the tax data.  This lets it measure households, transfer payments, and health insurance.  The tax information can only measure tax units.  They compared their results with the tax data and confirmed that they matched when measuring the categories that the IRS can measure.

My Conclusions:

  • There is no "right" number.  The proper issue is what is the question that you are trying to answer.  The shifts in households, with grandparents and adult children moving back together with parents may be a compensation for economic hard times.  These numbers show that it works and has more than compensated for income loss.  Health insurance costs have gone up dramatically, as these numbers show.  Transfer payments and a progressive tax rate do appear to have a significant effect.
  • The "middle class is vanishing" is at best misleading. 

Paper is at http://www.nber.org/papers/w17164

There is some more data on trends in household sizes, etc.  There is also a breakdown of quintiles.  For the all included houehold category, the bottom quintile saw 26.4% growth and the top quintile saw 52.6% growth.  The top 5% saw 63% growth.  There is no data for the top 1% because privacy related data blinding was applied by the census bureau, and only larger aggregates are reported.

So you can argue that all parts of the population saw significant improvement, or that the rich saw a larger improvement, or that the middle class is suffering.  The data shows that the progressive tax rate (EITC included) does have an effect, transfer payments and the social programs do make a difference, and that healthcare benefits do make a difference.

 

May 14, 2012 in Current Affairs, Politics | Permalink | Comments (0) | TrackBack (0)

News has a problem with economic reporting

News, whether web or paper, needs a story.  It's extremely hard to transform very slow processes into an interesting story, and even harder to explain complex slow processes.  Unemployment is an example.

All of the economic reporting that I see on unemployment tries to bring some excitement to the story.  Big news.  Sudden change. Get people scared or excited.  They present as complicated a diagram as possible, or find some chart that looks really bad or suddenly good.  But there is rarely any effort to take apart the complex system and show the separate parts in a way that can be understood.

This diagram illustrates the problem with using a clearer presentation.

Percent Job Losses During RecessionsClick on graph for larger image.
It separates out one important component in employment from the others. This shows how many people are working, and in a way that lets you compare it with other recessions.

There's no exciting story here.  It shows that this is a really severe recession, much worse than anything since World War Two.  It also shows that all the fuss and excitement over this plan or that change has made no difference.  You don't see changes due to elections either.  The one and only government driven change is the employment bubble from the 2010 census.

You also see an interesting evolution in the nature of recessions.  The last three have been very smooth and without the sudden jumps of previous ones.  They are also lasting much longer.

There are probably some very important lessons to be learned from this that would help in making decisions.  But there is no story.  There is no cause for sudden joy or sorrow.  There is no reason for panic and fear.  It's a long slow process that needs understanding.

As a result, it does not make the general news.  They need big excitement like "Unemployment reaches X".  A difficult to explain slowly changing increase in the number of actual jobs is not "news".

Graph from Calculated Risk.

February 04, 2012 in Current Affairs | Permalink | Comments (2) | TrackBack (0)

Modern Human, Neanderthal, and Denisovan cross-breeding

I went to talk by David Reich, Harvard Med School, last Thursday on genetic evidence measuring interbreeding among modern humans, Neanderthals, and Denisovans.  His talk also included significant background information.

First, he covered the techniques and difficulty of getting DNA samples from old bones.  Typically, the body has been completely contaminated with bacteria, fungi, plants, and insects.  Even after all their efforts at selecting protected interior portions of protected bones, less than 3% of the recovered DNA is mammalian.  This also explains the early emphasis on mitochondrial DNA.  There are thousands of mitochondria per copy of cell DNA.  This makes it much easier to get an acceptable sample of mitochondrial DNA.

Then he explained significant differences between Neanderthals and Denisovans.  For those who missed the recent news, the Denisovans are another form of human.  Remains were found in the Denisovan caves in southern Siberia in 2010.  Based on DNA clustering, the Denisovans, Neanderthals, and Modern Humans are each distinct. Within these three the various DNA samples cluster tightly and overlap in variations.  There is a substantial and statistically significant separation between the three. 

There are multiple samples of DNA for all these human variations.  These are sufficient to obtain reasonably complete genomes, despite the limited samples.

Neanderthal range is Europe and to an unknown extent eastward into Central and Southern Asia.  So far, the only Denisovan source is the caves in southern Siberia.

He then explained his terminology.  When comparing different animals, like man vs chimpanzees, they look at gene overlap.  When measuring interbreeding, they look at base-pair matching.  The basic measure is to find single base-pair changes in a gene, then determine how often that change is found in Modern vs Neandertal vs Denisovan.  This is used to derive percent sources.

Results:

  • Africans have the largest genetic diversity of modern humans and no measurable contribution of DNA from Neanderthal or Denisovan.
  • Non-Africans have about 2.5% Neanderthal DNA.  This level is relatively uniform across all non-Africans.
  • There is a cluster of Denisovan contribution in Papua, New Guinea, Australia and neighboring islands.  This level is not uniform.  Outside this region there is no Denisovan contribution.  Within these clusters there is one group at about 2.5% Denisovan, plus two other groups with different substitutions but both at about 5% Denisovan.

Based on change rates for base-pair substitutions, these interbreedings took place about 50K ya for both Neanderthals and Denisovans.

 Speculations

The simplest explanation for the Neanderthal mix is the "out of Africa" theory, with the interbreeding taking place in the Levant.  There is paleontological evidence for both Neanderthal and Modern humans living in the same hills at the same time, about 35-50K ya.  This is consistent with migration from Africa.  The lack of any Neanderthal contribution to Africans makes other mixing unlikely.

More speculatively, it also supports the hypothesis that the southern route (arabia, india, southeast asia, to australia) was first for modern human expansion.  This would explain multiple interbreeding events with Denisovans that affect only southeast asia and australia.  There was a second wave of modern humans from china later.  This shows up in the genetics, and shows that this was a separate event from the earlier wave.

Unrelated Comment

There is increasing evidence that interbreeding events are the norm, not unusual.  Genetics show events where Europeans substantially contributed to India.  There is presently a major interbreeding event between Africans and Amerindians in south america.  This is a change from previous theories that interbreeding events were rare.

Audience  Questions:

What percent of interbreeding couples does this indicate?  At most 2%, probably less than 1%, of children would be from interbreeding.  The genetic match is close enough that this must have been a social effect, not a viability effect.  There should have been no biological problems with interbreeding.  If the percent was above 2% the overlap the genetic overlap would be higher.

What about the "hobbits" of Indonesia?  No usable DNA could be recovered.  Hot climates degrade DNA too fast.  There are no usable DNA samples from any prehistoric bones of humans in hot climates.

September 20, 2011 in Current Affairs | Permalink | Comments (0) | TrackBack (0)

»