The files presented here contain historical estimates of hydrologic and 
meteorologic variables as computed by personnel at NOAA's Great Lakes
Environmental Research Laboratory in Ann Arbor, Michigan. Similar data
in Excel spreadsheet format were published since the mid-1990s, but 
we are now changing over to simpler text-based formats for a variety
of maintenance and usability reasons.  While this process is underway you
will see a mix of Excel-format and CSV-format (text) files.  The exact format
of the text files will vary. 

Data presented here will generally be monthly estimates of an aggregated
nature (i.e. mean over the entire lake, entire land surface, or entire 
lake basin [lake+land]). These estimates will be the result of either a
physical process model or a prescribed technique to utilize point observations.

Please note in ALL cases that these values represent just one possible
estimate, and that every data set is sensitive to variations in model forcing 
data and/or observational data.  Historical data used to derive these estimates
are generally obtained from federal data agencies in both the U.S. and Canada,
and are not a product of observing systems/programs run by GLERL. 

Over time we have observed that the data available from these various data 
collection sources is updated and revised, which propagates into changes in 
the estimates that we compute. Additionally, we have often noticed issues in
the raw data that made it past their standard QA/QC procedures. Note that they
deal with MASSIVE quantities of data, and have only a few personnel assigned 
to the task, so it's probably not surprising that some erroneous data makes 
it past their automated process. In order to deal with the possibility of
bad raw data, we have implemented our own filtering processes that attempt to
identify and remove any obvious problem data values. In some cases where the
remaining data is too sparse for our models, we then are forced to replace 
those missing values with some "reasonable" estimate prior to using the 
dataset for model computations.

One frequently asked question regarding these data is, "Why are the values
I see in the file today different from what was there last year for the same
month?  Shouldn't the value for MM,YYYY (e.g. June, 1985) be settled by now?"  
This is a reasonable question, and I agree with the rational idea that you 
would not expect historical values to undergo significant revision. You
might expect them to change a very small amount, but sometimes these differences
in our files can be quite large.  So what's going on?  There can actually be
several factors in play:
  1. The models we use for estimating many of these quantities can change.
     These may be methodology changes (typically minor) or recalibrations of
     the model parameters.  It is also possible, particularly in the future, that
     we will use a different model. All of these model changes will, obviously,
     result in changes to the output variables such as evaporation.
  2. Similarly, the methods used to aggregate station data into areal averages
     can change. We typically use a thiessen polygon method for aggregating
     station meteorological data. If/when the underlying map used for this
     process is updated, the results will change.
  3. The aforementioned issue of revisions to the underlying station/gauge data
     by the data agencies is one possibility. Each time we revise our monthly
     data sets, we pull a fresh copy of the station data to be aggregated and
     used in the models. When the data agency has identified an issue with the 
     older data and revised it, the station data I use today will reflect that 
     revised (or removed) value, and the aggregated values will change 
     accordingly.
  4. We will occasionally note persistent errors with the raw data from a 
     station.  When that happens, we will typically just remove it from
     consideration to "be on the safe side". These stations may only be 
     exhibiting the obvious problems for the most recent few years, but
     we will eliminate that station from use for the entire period of
     record because it would be much more complicated for our procedures
     to try to only eliminate the recent years.
  5. The criteria used for filtering erroneous input data values has been
     modified several times over the years. These filters started out as 
     just a few very simple range checks, but have expanded over time as 
     we discovered more issues that slipped past the filters.      
  6. We have, on occasion, uncovered errors in our processing software.  When
     that happens, we have to fix the problem and then create new files. As
     you would expect, that will result in revised data.
  7. Similarly, with the old Excel files, there was the potential for a
     copy/paste mistake.  They were built and updated manually by copying in
     tables of data in text format. On a few occasions I copied the wrong
     set of data (e.g. the table for Lake Erie instead of Lake Ontario or
     the data for 1980-1990 instead of 1981-1991). 
     

Monthly files currently available:

runoff_<lake>_arm.csv     These files are an aggregation of estimated streamflow
                          into each lake from the land surface. The estimates
                          are derived by extrapolating streamflow observations
                          from a selected set of individual gauges, using a
                          fairly simple Area-Ratio Method (ARM).  For more 
                          information, please see the publication "Development 
                          and application of a North American Great Lakes 
                          hydrometeorological database  - Part I: Precipitation, 
                          evaporation, runoff, and air temperature".

                          
prc_<lake>_<loc>_mon.csv  These files contain aggregated precipitation for each
                          of the specified areas. The value of <loc> will either 
                          be "lake", "land" or "basn", indicating overland, overlake
                          or overbasin (land + lake).
                          
                          
Because it is often requested, I am now adding some daily datasets that will be
located in a "daily" subdirectory.

subdata_<lake>*.csv       These daily files contain meteorology variables for either
                          a single "subbasin" or some aggregation of subbasins. The
                          subdata_???00.csv files are for the overlake area.  (Our
                          internal software system uses 0 as the subbasin number for
                          overlake.) The subdata_???_land.csv and subdata_???_basn.csv 
                          files contain overland and overbasin values, as you would expect.  
                          These daily files are the source used when computing monthly
                          values that get posted.


References:

HUNTER, T.S., A.H. CLITES, A.D. GRONEWOLD, and K.B. CAMPBELL. Development and 
application of a North American Great Lakes hydrometeorological database  - 
Part I: Precipitation, evaporation, runoff, and air temperature. Journal of 
Great Lakes Research 41(1):65-77 (DOI:10.1016/j.jglr.2014.4.12.006) (2015). 
https://www.glerl.noaa.gov/pubs/fulltext/2015/20150006.pdf