DBNet File Scanning


The SCANNER module is used by DBNet to ingest files when the generating program is not able to call the ALERT binding. The SCANNER reads the configuration file table/scanlist to determine what files to process. When the SCANNER finds a new file matching the scanlist the alert binding is called for that file. The four parameters required by the ALERT binding are defined in the scanlist. The SCANNER then stores this file's name, size and timestamp in the file tmp/completelist, indicating that this file has been processed. After the scanlist has been processed the SCANNER will then check the completelist for files that no longer exist, those entries will be deleted from the list. The SCANNER runs every $scan_time minutes, the default is 5.

The SCANNER has a few more details to worry about. The existence of a file is not enough for the SCANNER to process the file. The file must exist and keep the same size and timestamp for two consecutive scans. This delay is to allow the generating process time to complete writing to the file before it is ingested into DBNet. While this delay is necessary in some cases, there are other cases when it is not needed, the SCANNER has no way to distinguish the two cases. The scanlist also allows an age limit to be placed on scanned files. The last field on each line in the scanlist is the maximum age ( in minutes ) of a valid file. This field is used to prevent old data from being retransmitted in the event the tmp/completelist gets deleted or new data is added to the scanlist.

Each line in the scanlist consists of five parameters, white space is used to separate the parameters. The first parameter is the directory name where the files being scanned for exist. This parameter can contain shell meta characters. The second parameter is the file name pattern. This field is a perl regular expression that is applied to all the files found in the directory specified in the first field. The third, fourth and fifth fields are used to call the ALERT binding. They are the TYPE, SUB-TYPE and job name respectively. The last field is the maximum age, in minutes, for a file to be processed. Here are some sample scanlist entries:

/pub/data1/meta/model/ghm ghm_.* META ghm SCANNER 180
/pub/data1/model/grid/ghm ghm_.* GEMPAK ghm SCANNER 180

See Also