ICEDM: 2010

Friday, November 12, 2010

EQuIS Enterprise Comes to AMEC

Greetings:

In the last month we set up an EQuIS server at AMEC's data center in Atlanta in preparation for EQuIS Enterprise. We're really excited about Enterprise because it will allow project team members across the globe to access their data via the web. As Portland continues to provide environmental data management support to more and more AMEC offices the biggest request we get is for direct access to current data. As I've written before, we provide weekly tables for some projects, and recently have been providing Excel Workbook front-ends (which have been very popular). However, Enterprise will be light years ahead of those solutions because the team members will have access all sorts of reports, and can even drill down into the data.

We're also hoping that the ability for labs to email EDDs directly to Enterprise for automated loading will pan out. As has been written much before, a good mature technology is one you don't notice. With that in mind, we use many different labs with widely varying abilities to create good EDDs. I'm anticipating that uploading success will range from seamless to "Holy Cow!! GET it together!". I know of one really good lab that I'm confident will be able to make it work. We NEVER have problems loading their EDDs into EQuIS Professional. I'll definitely shout out kudos to them once I confirm their mad skills uploading EDDs to Enterprise. Conversely, we work with another well-known lab that CANNOT get it right, even when our lab contract includes financial penalties for delays caused by bad EDDs.

Even without Enterprise set up yet, the move to Atlanta has significantly sped up EQuIS access for the off-site AMEC offices we support. User access to the old server here in Portland was throttled by our relatively limited bandwidth as compared to the giant data pipes channeled into the data center.

Enhanced data security is another upside to moving the databases to Atlanta. Here in Portland I'd set up an automated daily data backup script that created a backup volume for any database on the server. However, the backups were stored locally on the server. Because the server didn't have its own AMEC account I was unable to set up an automatic method to transfer the files to another hard drive. So, every Monday morning Medora moved the files manually to another computer in case the server crashed and burned. The local backup files on the server were copied to a tape drive every night as well. Our databases in Atlanta now get a full backup four times a day and the backups are stored at third-party data center in another state. I feel pretty good about that! Four times a day definitely exceeds our needs.

Back to Enterprise: As of today I've got an virtual client server set up for the Enterprise install. The virtual server will serve as a client to the data server. I've got all the EQuIS databases upgraded to v5.5 and am ready to dive in with the Enterprise set-up. I'll keep you posted.

Dan

Thursday, November 4, 2010

Use VBA to Bold Detects by Looping through an Excel Cross-tab

The EQuIS cross-tab report is super handy and reliably spits out nicely organized cross-tab reports from EQuIS queries (reports). Earthsoft even worked out some cool conditional formatting such as underlines and italics to show where reported concentrations exceed screening levels. This in-app formatting is handy for screening, but looks ugly in a final report table.

For most of AMEC's projects we're asked by the project teams to format the data in Excel using a combination of BOLD for detects and some cell-shading color scheme to call out results equal to or above action levels such a risk screening levels or regulatory levels like MCLs. Colored tables cost more for report production and of course, you lose the color format's value when the table is photocopied in black and white. If there's only one action level, you're golden- just use BOLD for detects and a light shade of yellow for an exceedance which shows up as light gray in a photocopy, but still catches your eye on the monitor.

AMEC's data management team has used Visual Basic for Applications (VBA) to work up all sorts of Excel-based formatting apps for post-processing EQuIS reports. The most complex apps set up page headers and footers with table names, project numbers, and draft stamps, as well as insert table-specific notes in the appropriate location depending on a landscape or portrait table format and layout borders and page breaks.

At the heart of all our VBA-controlled formats is logic to correctly BOLD detects and manage shading of cells containing data with action level exceedances. Explaining all the logic for the complex formatting described above is beyond the scope of a blog post, but I will show you how to get started using VBA to loop through a cross-tabbed data set and locate and format detects. Next post we'll get into shading results that exceed screening criteria.

To get a handle on looping through data you'll need to set up way to use VBA to move from cell to cell in a predicable manner. Picture an automated way to move through your table with the up-down and side-to-side arrow keys. I always loop through the data going downward from row to row beginning in the first data column. I'm always consistent in my approach so it's become easy for me to recreate a loop without much thought- there's was a LOT of head scratching at first!! The idea is to begin at the first column and move downward row by row until there's no more data, then move back to the first row, and shift left to the next column, and so on.

To do this you'll need to write two nested loops in VBA, one to move through the rows and one to move across the columns. The code below depends on the lack of "U" qualifier to determine if the cell value is a detect. In first example below (Example 1) the code handles the case where qualifiers are in the same cell as the result value like this.

In Example 2 I'll show you how to handle cases like this where the qualifiers are in their own column next to the result value.

To try this out just create a new Excel VBA module in the workbook with your data and drop the code into it. Hopefully my notes in the code will explain how it works...

The code could be modified slightly to handle a "<" symbol should that be your way to identify ND's.

Feel free to drop me an email or call if you have problems or need more clarification.

I'm thinking of putting on a one day Excel VBA for Environmental Data course in AMEC's training center here in Portland. Please let me know if you're interested.

Dan

'EXAMPLE 1
'Sub to Bold detects where result and qualifier are in the same cell
Sub Loop_thru_cells1()
Dim j As Integer, i As Integer 'declare variable j for columns and i for rows
Dim sVal As String    'declare sVal as a variable to hold the result and qualifier text string

Cells.Font.Bold = False 'clears all bolds in the table
For j = 5 To 100 ' Starting column = 5; last column with data is 100; change to fit your data
   For i = 4 To 88 'Starting row is 4; last row is 88; change to fit your data
        sVal = Cells(i, j) 'set variable sVal = to the cell contents
        If InStr(sVal, "U") = 0 Then 'search the string for presence of a "U"; this covers "UJ";
             Cells(i, j).Font.Bold = True ' if a "U" is present then instr(sVal,"U") will be greater than 1 and the cell will be bold.
         End If
   Next i 'next row; when i = 88 this loop ends and the column (j) loop increases by 1
Next j 'next column

End Sub

'EXAMPLE 2
'Sub to bold detects where result and qualifier are in adjacent cells
Sub Loop_thru_cells2()
Dim j As Integer, i As Integer 'declare variable j for columns and i for rows
Dim sVal As String    'declare sVal as a variable to hold the result and qualifier text string

Cells.Font.Bold = False   'clears all bolds in table
For j = 5 To 100 Step 2 ' skips every other column
   For i = 4 To 88 'Starting row is 4; last row is 88; change to fit your data
        sVal = Cells(i, j + 1) 'sets the string variable = to the cell adjacent to the result
        If InStr(sVal, "U") = 0 Then 'search the string for presence of a "U"; this covers "UJ";
            Range(Cells(i, j), Cells(i, j + 1)).Font.Bold = True 'sets both the result and qualifier cells as bold
        End If
        Cells(i, j).Select
   Next i 'next row; when i = 88 this loop ends and the column (j) loop increases by 1
Next j 'next column

End Sub

Friday, August 20, 2010

Connect to EQuIS SQL-Server databases with Excel

At AMEC we’re always looking for ways to give our project team direct access to their data. We have many staff who like to drill down into their data and dig around, tallying detects, evaluating trends, and calculating risk values. To meet staff needs where they are looking to muck around and don’t have a specific query in mind for the data management team to run, we will “dump” a subset of data into Excel so they can filter it, crunch it, and crush it.

In the past I’ve used open database connectivity (ODBC) connections to connect end users to our EQuIS databases with SQL server views. OCBC is a little clunky and needs to be configured for each computer connected to the database. I’ve just discovered that the “Get External Data” features in Excel 2007 are a dead easy way to grab EQuIS data from SQL Server.

To try it out I created a detects-only groundwater VOC view using Management Studio. Then I clicked on Get External Data>From Other Sources>From SQL Server. In the little dialog box that popped up I entered my server name and clicked “next”.

Then I picked the database name with the view.

Finally, I picked the view I wanted to display, made sure to pick cell “$A$1” so the data would be placed in the upper left-hand corner of the destination worksheet and clicked “finish” leading view data to appear in my worksheet.

If I save the Excel workbook and re-open it at any time and the data automatically refreshes. I can even refresh the data while the workbook is open.

I envision laying out a work book with a tab for each analyte group. Then, the project team could open the workbook during an active investigation when we’re loading lots of data get an up-to-date view of their project’s analytical results.

Other more advanced uses I see with the Excel “Get External Data” include adding query input features and creating a dashboard with constantly updating time-series charts.

Let me know how you find ways to use Excel as a database front end.

Monday, June 28, 2010

Add Screening Levels to EQuIS-generated Excel charts

The AMEC data management team are often asked to add a horizontal line to time-concentration plots representing screening levels such as prediction limits, regulatory limits or risk screening levels. The screening level is a handy reference for the viewer to compare to concentrations, but manually adding screening levels to dozens of charts after I've whipped them out with EQuIS is enough to make me cry.

The easiest way to get the screening level line on to an Excel chart is to add a new series composed of the two end points (min date and screening level, and max date and screening level). Adding a new series works well because it moves with data should you readjust the x and y axes.

A comparison of my dummy data set below to the RSL in red shows we've got a little regulatory problem here....

On a recent project I posted 20 single parameter time-series charts representing upgradient-downgradient well clusters on one sheet such that the upgradient well cluster was on the left and the successive four other well clusters were oriented to the right with the most downgradient cluster on the far-right. Each well cluster consists of four wells each screened in a different water-bearing zone. The chart representing the shallowest zone was at the top of the sheet and each deeper zone was beneath it. I used EQuIS to generate about 30 Excel workbooks containing the 20 charts, one book for each parameter. Then, I wrote a couple of macros that looped through each workbook formatting and posting each set of charts on one sheet. (data and identifying info have been removed)

The project team liked the layout and formatting, but then asked me to add risk-screening levels... yikes!! I schemed for a day trying to think about how I could get EQuIS to generate a series of screening levels with the chemical data. My first thought was to modify EQuIS' analytical results stored procedure to update each result set with the risk data, then use the EQuIS chart utility to create each chart with the two series, my target constituent and the risk data. However, when I thought about what a hassle that would be to figure out min and max dates for each result set to apply to the RSL series I came up the Excel VBA solution posted below. My chart grouping macro calls this sub procedure prior to adding the charts to one sheet. It prompts the user to enter the screening level into an input box, then adds the series to each chart in the workbook. Quick, simple and I can re-run it for a variety of screening levels without having to regen the whole chart set. I almost broke my arm patting myself on the back....

Let me know if you have any questions.

'adds Risk Screening Level chart series to existing charts
Private Sub add_RSLs()

Dim lowDate As Date, highDate As Date
Dim SL 'screening level'
Dim ChartName As String

'removes old RSL data if it exists
On Error Resume Next
Application.DisplayAlerts = False
Sheets("RSL").Delete
Application.DisplayAlerts = True
On Error GoTo 0

'picks the date ranges for existing data to use with the soon-to-be created RSL chart series data
lowDate = ActiveWorkbook.Charts(1).Axes(xlCategory).MinimumScale
highDate = ActiveWorkbook.Charts(1).Axes(xlCategory).MaximumScale

'user enters the RSL
SL = InputBox("Enter Screening Level")

If SL = 0 Then
MsgBox "No SL Entered"
Exit Sub
End If

'adds a new worksheet along with the data to be used for the RSL chart series
Sheets.Add before:=Sheets(1)
ActiveSheet.Name = "RSL"
Sheets("RSL").Cells(1, 1) = lowDate
Sheets("RSL").Cells(2, 1) = highDate
Sheets("RSL").Cells(1, 2) = SL
Sheets("RSL").Cells(2, 2) = SL

'Assumes every chart in workbook is the same compound and will have the RSL series added to it
'Loops through every chart in the workbook
For Each cht In ActiveWorkbook.Charts
cht.Activate
ChartName = ActiveChart.Name

Charts(ChartName).Activate   'you'd think it'd already be activated, but I must have wanted to be sure

'deletes the old RSL series if it exists
On Error Resume Next
ActiveChart.SeriesCollection("RSL").Delete
On Error GoTo 0

'adds the new RSL chart series to each chart
With ActiveChart.SeriesCollection.NewSeries
    .Name = "RSL"
    .Values = Sheets("RSL").Range("b1:b2")
.    XValues = Sheets("RSL").Range("a1:a2")
End With

'formats the new series
ActiveChart.SeriesCollection("RSL").Select
Selection.Border.Weight = xlThin
Selection.Border.LineStyle = xlDash

sName = Selection.Name
Selection.MarkerStyle = xlNone
'Selection.MarkerForegroundColorIndex = 3
'Selection.MarkerBackgroundColorIndex = 3
Selection.MarkerSize = 2
Selection.Border.ColorIndex = 3

Next cht
End Sub

Tuesday, June 15, 2010

Using Excel in place of ARC GIS- Really....

Who needs ArcGIS when you have MS Excel... (?)

Here's a new approach to map making.

Tuesday, June 1, 2010

T-SQL Reference Books

I've found the following book titles to be invaluable in modifying stored procedures and functions in SQL-Server. I've used info from the books to help me analyze data too. Neither one of these are really a newby How-to, but if you know just a few basics (SELECT, FROM, WHERE)and you're curious you'll be golden.

1. SQL Pocket Guide by Jonathon Gennick (O'Reilly). This one's a great reference book when you want to know something quickly- like a date function format code or what the dang in-string function is (in VBA it's instr() and T-SQL it's CHARINDEX).

2.SQL Cookbook by Anthony Molinaro (O'Reilly). This one helps you dig into T-SQL. It explains complex steps in a simple manner and give the rationale behind the author's approach. I've learned a lot from this book.

Non Detects and Data Analysis

This is from the PracticalStats.com newsletter, by Dennis Helsel.

"Nondetects And Data Analysis" (both the course and textbook) spend considerable time on survival analysis methods applied to censored environmental data. In spite of the many documented cases of errors arising from substituting values for nondetects (see
http://dx.doi.org/10.1093/annhyg/mep092 for a new explanation of the dangers), substitution remains popular because it is so easy. Yet there are two other easy solutions that do not introduce the errors inherent in substitution.

-- For computing descriptive statistics
What is a measure of the center for the following dataset?
<5 <5 8 15 19 24 27 33 41
To compute the mean we would need to use a survival analysis procedure like maximum likelihood estimation (MLE) to avoid the 'invasive data' problem of substitution. But there are two easier solutions.
a) use the median. The median of these 9 values is the 5th observation from the bottom, or 19.
b) use the percent of data above the detection limit. There are 7/9, or 78%
of the data above 5.

And for this second dataset with two detection limits,
<5 <5 8 15 <20 24 27 33 41
the two options remain the same. The median is <20. And there are 4/9 or 44% of the observations above 20.

-- In general, we can generalize these two procedures to other statistical applications. First, we can use methods based on percentiles, or ranks. The rank i divided by (n+1) is the position for the percentile of that observation. The 6th ranked observation (24) above, for example, is at
(6/10) or the 60th percentile. Methods based on ranks such as nonparametric hypothesis tests are procedures analyzing the percentiles of the data.

Second, we can treat the data as being either above or below the (highest) detection limit, and interpret the proportion of data falling above the
limit. Binomial procedures allow us to discern changes in such proportions.

-- For hypothesis testing, nonparametric hypothesis tests are based on percentiles, or ranks. To compare two sets of data, the Mann-Whitney (also called rank-sum) test can always be used without requiring substitution. The test determines whether the cumulative distribution frequencies (the set of
percentiles) in the two data sets are similar, or different. If multiple detection limits are present, all data below the highest limit are coded as being tied in order to use the simple version of this test. For example, the two data sets:
<5 <5 8 15 <20 24 27 33 41
and
<5 <5 <5 <5 6 9 10 12 16 21

are re-coded to
<20 <20 <20 <20 <20 24 27 33 41
and
<20 <20 <20 <20 <20 <20 <20 <20 <20 21

and their ranks are:
7.5 7.5 7.5 7.5 7.5 16 17 18 19
and
7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 15

The identical procedure would precede the Kruskal-Wallis test when there are three or more groups. This simple method is far superior to substitution because no artificial pattern alien to the original data is placed into the data, as it is with substitution. Unlike substitution followed by t-tests or ANOVA, if differences are found by the Mann-Whitney or Kruskal-Wallis tests, they can be believed. If patterns are obscured by re-censoring at the highest limit, more complicated survival analysis methods are available. But we're trying here to keep it simple.

The second test procedure is the binomial-based "test of proportions" or contingency table test. Here the proportions of data above the (highest) detection limit are tested for similarity or difference. Data are coded into two groups, above and below the highest detection limit. For the above data, the first group has 4/9 or 44% of values above 20, while the second group has only 1/10 or 10% above the value of 20. The test determines whether these two percentages are significantly different.

-- For correlation and regression
Nonparametric correlation coefficients Kendall's tau and Spearman's rho may
be computed on data with nondetects, without substitution. Kendall's tau
easily handles data with multiple detection limits, though the software is
not usually written to do so. The nonparametric Theil-Sen line (used for
the Mann-Kendall trend test, for example) may end up with a
One approach to perform regression with a binary Y variable is called logistic regression. Here the probability of being in the higher category, say the probability of recording a detected value, is predicted from one or more explanatory variables. Interpretation of the results is very similar to ordinary regression.

So all in all, these simple methods do not require substitution, can be computed with standard statistical software, and avoid the pitfalls of "invasive data" that result from fabricating data by substitution. If you can't justify going to the more complicated procedures in Nondetects And Data Analysis that handle nondetects at multiple levels, these simpler methods should meet your requirements.

Monday, May 31, 2010

NW EDMC photos

Thanks again to everyone for a great conference. Here's some photos and a video from our Environmental Data Management conference May 20 and 21, 2010: