ICEDM: Migrating Historical Data

We're just wrapping up a big historical data migration into an EQuIS database. The data are needed to support a site remedial investigation. This migration had an interesting twist I thought I'd share with you. First a little background on data migration.

In the worst cases historical data comes from a mish-mash of hard-copy reports, EDDs in various formats, and Excel tables. Crunching through these reports and figuring out how the results tie into various location name spellings is tedious and error-prone. For example, are "S1-MW01" and "SMW1" the same location? Conversely, the best cases provide you with an Access database that's somewhat normalized with unique lists of valid values and obvious relationships between key data tables. Between these two extremes lies the task of crunching a pile of cross-tab Excel tables.

The intrepid (curious and determined) data cruncher can take an Excel table and "unwind" it into a columnar database-friendly form using a macro that reads through the table using the same loop within a loop procedural logic I presented here. The columnar output can be easily rearranged into an ESBASIC EDD using Access. I like Access for this task rather than Excel. In Excel I find I spend a lot of time running the lookup function and creating key columns with formulas that have to be pasted down. In Access I set up a ESBASIC template and then build queries that move the "unwound" columnar data from a table in Access to the ESBASIC EDD. I'll even link my queries to reference tables to get the proper valid values applied to the EDD. Then, I export the data from the ESBASIC template and load it into EQuIS. If the EDD has errors and won't load, I can make an a adjustment or two in the Access query and give it another try. This allows quick, successive iterations of the migration actions to ESBASIC and saves a ton of time. Conversely, it's easy to make a little error when using Excel to make the transition to ESBASIC that forces you to start your multi-step process all over.

The unusual data migration we're finishing now began as an 800+ page pdf!! No kidding. The cool part was that all the data was collected by one consultant and the tables had clearly been created from a well organized database. No, we couldn't get the database- wouldn't that have been nice? We tried reading the data into a text file with optical character recognition (OCR), but the tables had these alternating light gray bands that messed with the software's ability to read the characters accurately.

So, we hand entered it all. Really. For tables like VOCs with lot's of NDs we created a template and then copy-pasted each new table. The new tables values were compared to the pdf and updated in cases where the detection limit varied or a spurious detect occurred. Once the tables were all created I used the Excel unwinding procedure I described above to create columnar data.

Tables with metals and wet-chem data that contained lot's of detects were a little harder. To handle these I created an Excel data entry form that builds the EDD as the data is entered. The tab order for the each record was set up to match the order the chemicals appear in the pdf tables.

As usual, we used Access to create the ESBASIC EDDs. We loaded the all EDDs into EQuIS, then created crosstab data tables and hand-checked EVERY record against the source pdfs.

I'm very pleased because the whole process went rather quickly considering the scale of the migration. A couple of years ago I would have bid this much higher that I did. End the end we came out on budget and have nice database that the project team can count on to start their remedial investigation.

ICEDM

Tuesday, January 18, 2011

Migrating Historical Data

No comments:

Post a Comment