Todd Park’s Talk on Unleashing the Power of Open Data and Innovation for Health Care

Marc Whitlow — Thu, 27 Sep 2012 15:17:49 +0000

I had the chance to hear the rebroadcast of a talk by Todd Park, the U.S. Chief Technology Officer, given on June 18, 2012 at The Commonwealth Club of California, on Unleashing the Power of Open Data and Innovation for Health Care. He is a leading proponent for open health data in the U.S. health system. His team at Health and Human Services in collaboration with the Institute of Medicine launched the Health Data Initiative in 2009. This initiative follows in the footsteps of two other successful U.S. government open data projects. 40 year ago National Oceanic and Atmospheric Administration (NOAA) made weather data available for free public download. In the 1980s, the U.S. government began making the Global Positioning System (GPS) available to the public.

Health Care Cost Reduction through Open Data

The goal for this Health Data Initiative was to spur health care innovation that will drive down the cost of healthcare by allowing everyone to have access to health care information while protecting privacy and maintaining confidentiality of the information. Mr. Park presented an example of how quickly such a scenario could come about. In February of 2011, Georgetown hosted a hack-a-ton. IN 8 hours, a group from Mia in Pittsburgh having no healthcare background, but with expertise in supply chain management built a working prototype of Food Oasis, an app to address the food desert problem. If you live in a food desert, you do not have access to healthy food. You can text a message to Food Oasis indicating the food you would like to purchase. The message gets sent to a web site where farmers’ markets and food coops can view the orders. These suppliers then aggregate the data and find the orders they can fulfill. They then text back to consumers when and where their orders can be picked up. Due to the low overhead, this turns out to be a cost effective way of resolving this problem.

You Have the Right to Your Medical Records

You absolutely have the rights to your medical records. In fact the head of the Department of Health and Human Services, check withing Todd for the name, has published an open letter stating that you have the rights to your own medical records.

Information at HealthData.gov

At healthdata.gov you can find information on:

Administrative	Data on administering health care delivery, enrollment into health insurance plans and appeals.
Biomedical Research	Authoritative, up-to-date medical and scientific information resources for patients, families, health care providers and researchers.
Children’s Health	Information on children’s health and health-related services for researchers, policymakers, patients and families.
Epidemiology	Public health databases and registries regarding births, deaths, disease incidence, health event case reports, demographics, community health.
Health Care Cost	Includes National Health Expenditure Accounts (NHEA), the official estimates of total health care spending in the United States.
Health Care Providers	Freedom of Information Act disclosable health care provider data for providers.
Medicaid	General information on eligibility and claims data developed to support research and policy analysis initiatives for Medicaid recipients and other low-income populations.
Medicare	Cost report data from annual reports filed by hospitals, home health agencies, and other facilities; claim-level public use files for all major types of care.
Population Statistics	Metrics on community health, health care system, and determinants-of-health performance at national, state or county levels.
Quality Measurement	Quality and patient satisfaction data available via Application Programming Interfaces (APIs) for nursing homes, hospitals, home health agencies, and dialysis centers.
Safety	Includes all company-issued recalls for drugs, food, products from 2009 to the present; hazardous substances and environmental and public health maps.
Treatments	Information and databases about marketed drugs, including downloadable resources on medication content and labeling, text messaging libraries and product listing directories.

I would encourage you to listen to the recording of Mr. Park’s talk on The Commonwealth Club of California web site. He covered many more health case related and open data topics during the hour he talked and answered audience questions.

Creation of Test Patients Table in Microsoft Excel

Marc Whitlow — Tue, 18 Oct 2011 18:28:58 +0000

We are helping Deb Zajchowski at The Clearity Foundation on their patient database. The Clearity Foundation is a non-profit organization dedicated to "improving treatment options for ovarian cancer patients." To improve treatment they take a personalized medicine approach. They have a privacy-ensured database in which they collect information on the patient’s clinical history, including physician’s diagnosis, diagnostic procedures, treatments and the results from tumor molecular profiling analyses. The database also records the drugs that are likely to have clinical benefit based on the profile of the patient’s cancer.

Our goal is to enhance the Clearity database ability to track the patient outcomes in a retrospective analysis. The current database was developed by Michael L. Petka. It works well for the profiling and reporting needs of the organization. However, Mike is currently occupied with other aspects of database enhancement.

In order to ensure the privacy of the patient, we needed a test database that did not contain any actual patient information. In this blog I will describe how we created a test Patients table in Excel.

The Patients table was created by Michael L. Petka, and is made up of the following columns:

**Patients Table Columns**
Column Name	Description
PatientID	Patient Identification Key
PatientLastName	Last Name
PatientFirstName	First Name
PatientMiddleInitial	Middle Initial
PatAddress	Street Address
PatCity	City
PatState	State
PatZip	5-digit ZIP Code
SSN	Social Security Number
DOB	Date of Birth
Sex	Sex
PatientTelephoneNumber	Telephone Number
PatientEmail	Patient Email

Patient Identification Key

The patient identification key (PatientID) is a sequential numbering of the patient rows.

Patient Name

We were fortunate to find a table of 10,000 random names at The ColdFusion Open Source Software Blog. In addition to the columns we needed, this database has complete name strings, both first name first, and last name first. The first name, last name and middle initial columns were cut and pasted into our test Patients table in Excel.

Patient Sex

The sex of the random names was determined in Excel by comparing the first name of a random name with a list of girls’ names from RandomNames.com. If the name matched, F (female) was assigned, otherwise M (male) was assigned, using the function below.

Sex Determination Based on Name

=IF(EXACT(A5,LOOKUP(A5,I$2:I$895)),"F","M")

where column A contains the random first name (above element A5 is being examined), and array I2 to I895 contains the girls’ first names.

Street Address

We found 261 of the most popular neighborhood street names at Living Places. We generated random house numbers and random street names, then concatenated them together in Excel.

Random House Number

=INT(11000*RAND())

Random Street Name

=INDIRECT("StreetNames!B"&RANDBETWEEN(2,262))

where array B2 to B262 in the StreetNames worksheet contains the most-popular neighborhood street names.

Street Address Concatenation

=CONCATENATE(A5," ", B5)

In the example above we are working on row 5, where column A contains the random house number and column B contains the random street name.

City, State and ZIP code

A table containing city, state and ZIP codes of 80,810 places was downloaded from A Free Zip Code Database as the Excel file free-zipcode-database.xlsx. This file has more than just city, state and ZIP codes. It also includes type (STANDARD, PO BOX ONLY, MILITARY, etc.), county name, latitude, longitude, population, land area and water area. In Excel, the table was sorted on type, for we only wanted to use types STANDARD and PO BOX ONLY. After removing everything else, the list was reduced from 80,180 to 73,756 entries. We then generated 10,000 random ZIP keys to match up with our 10,000 random names in Excel using RANDBETWEEN(2,73757). All of this was stored in the City worksheet in Excel. In the Patients table (worksheet), the city, state and ZIP codes were added using arandom ZIP code key. The city name was capitalized using the PROPER function; see below.

City, State and ZIP Code

=PROPER(INDIRECT("City!D"&City!M5))

=INDIRECT("City!E"&City!M5))

=INDIRECT("City!A"&City!M5))

Where “City!D”, “City!E” and “City!A” are the city, state and ZIP code columns, respectively, in the City worksheet.

Social Security Number

A random social security number (SSN) was generated for each fictitious patient by concatenating three RANDBETWEEN operations together; see below.

Social Security Number

=CONCATENATE(RANDBETWEEN(100,999),"-",RANDBETWEEN(10,99),"-",MID((RANDBETWEEN(10000,19999)),2,4))

Date of Birth

We wanted a normal distribution of birthdays, so we randomly seeded the NORMINV function with random value between 0 and 1. We used a mean of 21,000 (June 29, 1957) and a standard deviation of 5,000 days (13.7 years) in the NORMINV function to get an acceptable distribution; see below.

Normally Distributed Date of Birth

=NORMINV(RAND(),21000,5000)

Telephone Number

Random telephone numbers were generated in the same manner as the social security numbers, by concatenation of three RANDBETWEEN operations; see below.

Telephone Number

=CONCATENATE(INT(RANDBETWEEN(100,999)),"-",INT(RANDBETWEEN(100,999)),"-",MID((RANDBETWEEN(10000,19999)),2,4))

Email Address

Fictitious patient email addresses were constructed from each fictitious patient’s first name, last name and the domain name “@example.com”; see below.

Email Address

=CONCATENATE(C5,".",B5,"@example.com")

where columns B and C contain the last and first names, respectively. In the example above we are working on row 5.

Stability of the Excel Random Cells

We found that the randomly created cells in Excel were not stable. If such a cell were edited, previous random values would be regenerated, effecting all the cell that depended on that random number. Therefore, the final step was to write out the Patients worksheet as a comma-separated values (CSV) file, and then to read the CSV back into Excel. In this way we were able to create a stable test Patients table in Excel.

Colabrativ, Inc. » database