Description of how the Data Assimilation scripting works. James McCreight of NCAR supplied me with scripts developed by Zhengtao Cui (at NCAR?) for automated retrieval of streamflow gauge data from USGS and format conversion into a standard netCDF format used by the National Water Model. These source code files are also available on github.com under the NCAR domain, project name wrf_hydro_nwm_NWIS and I think they are a match to what I received, although I did not verify that. I used the scripts from James as a template for doing the same thing with Canadian streamflow data. All scripts are written in Python 2.7 and are intended for use in a linux environment. Basic download workflow is to start up a master process that runs in an infinite loop. It appears to me that there may be some limitations on process length in the environment for which this was developed, so the master process seems to be set up to die and restart on a schedule. Regardless of that mechanism, the underlying process is based on this infinite never-ending loop that checks the server at a prescribed interval. It compares timestamps of the streamflow gauge data files with the timestamps of the corresponding file on the local file system. When the remote file is newer than the local file, that means it has been updated and needs to be downloaded. That is done, and the downloaded data is then translated into an appropriate netCDF file. After all files have been checked, the process sleeps until the next prescribed update check. Conversion to NetCDF format is done as a separate process, and the code for that process is also supplied. USGS scripts received from James McCreight: name purpose --------------------- ---------------------------------------------------------------- run_usgs.sh bash script used to start the main infinite loop process parallel_download_master.py infinite loop process to run routine for check/download usgs_iv_retrieval.py control script to get all stations updated in the last N minutes find_changed_site_for_huc.py determine which stations are updated, and make list fetch_sites.py download the stations identified in a list make_time_slice_from_usgs_waterml.py convert downloaded station data from USGS format to NWM netCDF TimeSlice.py build the netCDF timeslice files USGS_Observation.py read USGS format Scripts developed by Tim Hunter for Canadian flow data: name purpose --------------------- ---------------------------------------------------------------- run_canflow.sh bash script used to start the main infinite loop process parallel_dm_can.py infinite loop process to run routine for check/download canadian_flow_retrieval.py control script to identify and download all stations updated in the last N minutes make_time_slice_from_canada.py convert downloaded station data from WSC format to NWM netCDF TimeSliceC.py build the netCDF timeslice files WSC_Observation.py read WSC (Water Survey of Canada) format Note that I kept the script organization and naming relatively consistent with the stuff developed by Zhengtao, but I did simplify/combine the Canadian stuff a little bit. It should be relatively easy and straightforward for whoever implements this on WCOSS or elsewhere to adapt the code and use the new Canadian scripts as a separate standalone process. If the goal is to build a combined process script for doing both the US and Canadian files together, then that will likely be a bit more complicated, obviously. But my hope is that keeping things relatively parallel in structure and naming will facilitate that work (if needed).