NCEP Home > NCO Home > Systems Integration Branch > Decoders > BUFRLIB > BUFRLIB Table of Contents > Introduction
Printer Friendly Version
Introduction
Throughout the discussion of the BUFRLIB software, it will be helpful
for the reader to visualize a BUFR message as containing one or more BUFR
data subsets, each containing the data for a single report from a particular
observing site at a particular time and location.
In turn, each data subset (i.e. report) will typically contain, in addition
to time and location information, data values for items such as e.g. pressure,
temperature, wind direction and speed, humidity, etc. for that particular
observation. Finally, BUFR messages themselves are typically stored in files
containing many other BUFR messages of similar content. Therefore, if we were
to summarize in a top-down fashion, we could say:
"A BUFR file contains one or more BUFR messages, each containing one
or more BUFR data subsets, each containing one or more BUFR data values."
If nothing else, remembering this hierarchy will at least make the user
interface to the BUFRLIB software more intuitive, and for that reason
it is worth keeping in mind.
So, without further ado, let's dive in!
As a whole, the BUFRLIB software is a library containing more than 250 different
subprograms and functions; however, a typical user will never directly call more
than 10-20 of them, and the rest are lower-level routines that the software uses
to accomplish various underlying tasks and which can therefore be safely treated
as "black box" from a user perspective. Do note, however, that most of the software
is written in FORTRAN and is intended to be used as a library rather than as a
"stand-alone" program; therefore, the user must possess at least a rudimentary
knowledge of FORTRAN in order to be able to construct an application program which
calls the proper BUFRLIB subroutines in the proper order. Several example application
programs are provided later on within the documentation and can be used as references;
however, these obviously only begin to touch on the myriad number of applications in
which the software can be used.
A typical user's first encounter with the BUFRLIB software will most likely be
as follows:
CALL OPENBF ( LUBFR, CIO, LUNDX )
Input arguments:
LUBFR INTEGER Logical unit for BUFR file
CIO CHAR*(*) 'IN' or 'OUT' (or 'APX', 'NUL', 'NODX',
'SEC3' or 'QUIET')
LUNDX INTEGER Logical unit for BUFR tables
This subroutine identifies to the BUFRLIB software a BUFR file that is connected
to logical unit LUBFR. The argument CIO
is a character string describing how the file will be used, e.g. 'IN' is used to access
an existing file of BUFR messages for input (i.e. reading/decoding BUFR), and 'OUT'
is used to access a new file for output (i.e. writing/encoding BUFR). An
option 'APX' is also available which behaves like 'OUT', except that output
is then appended to an existing BUFR file rather than creating a new one from scratch,
and there are also some additional options 'NUL' and 'NODX' which can likewise be used
instead of 'OUT' for some very special cases as described later on in the documentation.
There is also an option 'SEC3' which can be used in place of 'IN' for certain cases when
the user is attempting to read BUFR messages whose content and descriptor layout are
unknown in advance.
However, all of these additional options each behave enough like 'IN' and 'OUT' that,
except where otherwise noted, it will be sufficient to further consider only the 'IN'
and 'OUT' cases for the purposes of this introductory discussion. The only other
possible option is 'QUIET', which is a special case not involving the connection of
any actual logical unit to the BUFRLIB software and which will be covered later.
The third and final call argument LUNDX identifies the logical
unit which contains the definition of the DX BUFR tables to be associated with unit
LUBFR. Except when CIO='SEC3', every BUFR
file that is presented to the BUFRLIB software must have a DX BUFR tables file associated with it,
and these tables may be defined within a separate ASCII text file
(see Description and Format of DX BUFR Tables
for more info.) or, in the case of an existing BUFR file, may be embedded within the first
few BUFR messages of the file itself, and in which case the user can denote this fact to the
subroutine by setting LUNDX to the same value as LUBFR.
In any case, note that LUBFR and LUNDX
are logical unit numbers; therefore, the user within his or her application program must have
already associated these logical unit numbers with actual filenames on the
local system, typically via a FORTRAN "OPEN" statement.
When an existing BUFR file is accessed for input (CIO='IN'), the associated
DX BUFR tables are stored internally within the BUFRLIB software and are
referenced during all subsequent processing of that file. Likewise, when a
file is accessed for output (CIO='OUT'), the associated DX BUFR tables are still
stored internally for subsequent reference; however, the file itself is also
initialized by writing the BUFR table information (as one or more BUFR
messages) to the beginning of the file, except when CIO='NODX',
and in which case the writing of these additional messages is suppressed.
At this point, a brief mention of the CIO='SEC3' option is in order.
As noted above, this is the only value of CIO (other than 'QUIET') where
it is not necessary to
provide pre-defined DX BUFR tables via LUNDX. Instead, this option
instructs the BUFRLIB software to unpack the data description section (Section 3) from each BUFR
message it reads and then decode the contents accordingly. In this case, it is necessary to provide
a set of BUFR master tables containing listings of all possible BUFR descriptors (see
Description and Format of BUFR Master Tables for more info.),
but otherwise no prior knowledge is required of the contents of the messages to be decoded.
Otherwise, whenever CIO='QUIET' is specified, this is a special case
which does not involve the actual connection of any logical unit to the BUFRLIB software. In this
case, the value of the input argument LUBFR is ignored, and the value of
LUNDX is an integer value from the below list indicating the level of
verbosity of error messages and diagnostics to be written by the BUFRLIB. The default value is 0,
but this can be modified as shown to suit the user's preference and then remains in effect for all
logical units for the life of the application program, or until a subsequent additional call is
made to this subroutine with CIO='QUIET' and a new specified verbosity
level in LUNDX:
-1 No printout except for catastrophic messages
0 Limited printout (default)
1 Warning messages are printed
2 Warning and informational messages are printed
In any case, messages are normally printed to standard output unless the user provides a special
inline version of subroutine ERRWRT, as described later on within the
documentation.
Currently, as many as 32 BUFR files can be simultaneously connected to the BUFRLIB
software for processing. Of course, each one must have a unique
LUBFR number and be defined to the software via a separate call
to subroutine OPENBF.
Since OPENBF is used to initiate access to a
BUFR file, it stands to reason that CLOSBF would
be used to terminate this access:
CALL CLOSBF ( LUBFR )
Input argument:
LUBFR INTEGER Logical unit for BUFR file
This subroutine severs the connection between logical unit LUBFR
and the BUFRLIB software. It is a good idea to call CLOSBF for
every LUBFR that was identified via OPENBF;
however, it is especially important when writing/encoding a BUFR file in order to
ensure that all output is properly flushed to LUBFR. It is also worth
noting that CLOSBF will, before returning, actually execute
a FORTRAN "CLOSE" on logical unit LUBFR, whereas it was
previously noted that subroutine OPENBF did not itself handle the
FORTRAN "OPEN" of the same LUBFR.
Now that we have covered the library routines that operate on the BUFR file level,
and recalling the hierarchy structure that was previously discussed,
it is now time to continue on to the BUFR message level:
When LUBFR points to a BUFR file for input:
CALL READMG ( LUBFR, CSUBSET, IDATE, IRET )
IRET = IREADMG ( LUBFR, CSUBSET, IDATE )
Input argument:
LUBFR INTEGER Logical unit for BUFR file
Output arguments:
CSUBSET CHAR*(*) Table A mnemonic for BUFR message
IDATE INTEGER Section 1 date-time for BUFR messsage
IRET INTEGER Return code:
0 = normal return
-1 = no more BUFR messages in LUBFR
Subroutine READMG reads the next BUFR message from the given
BUFR file pointed to by LUBFR. The associated function
IREADMG does the same thing, but returns IRET as its
function value which can then, e.g. be directly utilized as the target variable
in an iterative program loop. The choice of which to use is merely one of
programming preference and/or personal style, as both have the same net effect.
In either case, the next BUFR message is read into internal arrays within the
BUFRLIB software (from where it can be easily manipulated or further parsed)
rather than passed back to the application program directly. If the return code
IRET contains the value -1, then this indicates that there are no
more BUFR messages (i.e. end-of-file) within the given BUFR file, and in which
case the file itself will have been automatically disconnected from the BUFRLIB
software via an internal call to subroutine CLOSBF. Otherwise, if
IRET returns with the value 0, then the character argument
CSUBSET will contain the Table A mnemonic corresponding
to the type of message that has just been read
(see Description and Format of DX BUFR Tables
for further information about Table A mnemonics), and the integer argument
IDATE will contain the date-time, in format YYMMDDHH, that was contained
within Section 1 of the message (although it is also possible to have IDATE
returned in format YYYYMMDDHH; this is accomplished via a preceding call to
subroutine DATELEN, as shown within some of the example programs).
Alternatively, when LUBFR points to a BUFR file that has been
opened for output, the following message-level subroutines are most commonly used:
CALL OPENMG ( LUBFR, CSUBSET, IDATE )
CALL OPENMB ( LUBFR, CSUBSET, IDATE )
Input arguments:
LUBFR INTEGER Logical unit for BUFR file
CSUBSET CHAR*(*) Table A mnemonic for type of BUFR
message to be opened
IDATE INTEGER Date-time to be stored within
Section 1 of BUFR messsage
Both of these subroutines are similar in that they open and initialize, within
internal arrays, a new BUFR message for eventual output to LUBFR, using the
arguments CSUBSET and IDATE to indicate the type
of message to be opened. The difference is that subroutine OPENMG will
always open and initialize a new internal message, even if the CSUBSET and
IDATE arguments have not changed since the previous call to
OPENMG, whereas OPENMB will only open a new message
if either CSUBSET or IDATE has changed, and otherwise
will simply return while leaving the existing internal message unchanged, so
that subsequent data subsets can be stored within the same internal message.
For this reason, OPENMB is much more widely used, since it allows
for the storage of an increased number of data subsets within each BUFR message
and therefore improves overall encoding efficiency. Regardless, in the case of
either subroutine, whenever a new BUFR message is opened and initialized, the
existing internal BUFR message (if any) will be automatically closed and written
to output via an internal call to the following subroutine:
CALL CLOSMG ( LUBFR )
Input arguments:
LUBFR INTEGER Logical unit for BUFR file
thereby alleviating the user from having to directly do so within his or
her application program. Furthermore, since, in the case of a BUFR file that
was opened for input, each subsequent call to subroutine READMG will
likewise automatically clear an existing message from the internal arrays
before reading in the new one, it is rare to ever see subroutine CLOSMG
called directly from within an application program!
Now, continuing on within our top-down hierarchy structure to the
BUFR data subset (i.e. report) level, things now begin to get a little more
complicated, because the order in which routines at this level are called
with respect to routines at the data values level depends on whether the
underlying BUFR file was opened for input or output access.
More specifically, if the BUFR file was opened for input access (and, of course,
a successful call was subsequently made to subroutine READMG (or function
IREADMG) in order to read a BUFR message into the internal arrays!),
then the next step is to do the following in order to read a subset from that
internal message:
CALL READSB ( LUBFR, IRET )
IRET = IREADSB ( LUBFR )
Input argument:
LUBFR INTEGER Logical unit for BUFR file
Output arguments:
IRET INTEGER Return code:
0 = normal return
-1 = no more BUFR data subsets in
current BUFR message
As was the case previously with READMG, subroutine READSB
has its own functional equivalent IREADSB which returns IRET
as its functional value. Either way, a return code value of -1 within IRET
indicates that there are no more data subsets within the given BUFR message
and that, therefore, a new call to READMG (or IREADMG)
is required in order to read the next BUFR message from the associated BUFR
file before another subset can be read. At any rate, once a subset has been
successfully read (as before, this reading is done into internal arrays!),
then we are ready to call the values-level subroutines in order to retrieve
actual data values from this subset.
If, on the other hand, the BUFR file was opened for output access, then the
appropriate values-level subroutines must be called before calling the
relevant subset-level routine WRITSB, which makes sense once
we recognize that the function of this routine is to encode all of the data
values that have been stored for the current subset and then pack that entire
subset into the current message within the internal arrays. Put another way,
we must store the data values to be contained within a subset before we can
store the subset itself! Here is the routine:
CALL WRITSB ( LUBFR )
Input argument:
LUBFR INTEGER Logical unit for BUFR file
Again, this subroutine is called to indicate to the BUFRLIB software that all
necessary data values for this subset have been stored and thus that the subset
is ready to be encoded and packed into the current message for the BUFR file
associated with logical unit LUBFR. However, we should note
here that the BUFRLIB software will not allow any single BUFR message to grow
larger than a certain size (usually 10000 bytes, although this can be increased
via a call to subroutine MAXOUT as described later on in the
documentation); therefore, it can happen that an attempt to pack a subset within
the current message will not be possible due to a lack of remaining available space!
If this occurs, then WRITSB will automatically flush the current
message to logical unit LUBFR, open and initialize a new message
using the same CSUBSET and IDATE values as were specified
in the previous call to OPENMG or OPENMB, and then
encode and pack the subset into that new message, all without any additional
effort or worry on the part of the user's application program!
(As a side note, the default BUFR message size limit of 10000 bytes within the
BUFRLIB software is a practical one based upon the current specifications of
certain meteorological telecommunications networks and is not a limit
imposed by the BUFR code form itself. Theoretically, at least according to the official
WMO Manual #306,
the only limit to the size of a BUFR
message is the constraint that such a size (in bytes) must be representable as an
integer of 24 bits or less so that it can be encoded within bytes 5-7 of Section 0
of the message. Since many applications may prefer (or even require?) a BUFR
output message size larger than 10000 bytes, subroutine MAXOUT can be
used in such cases, as shown later on in the documentation.)
At last, we have reached the proverbial "meat and potatoes" part of the
discussion, where we now discuss the subroutines that are used to write/read
actual data values to/from a data subset:
CALL UFBINT ( LUBFR, R8ARR, MXMN, MXLV, NLV, CMNSTR )
CALL UFBREP ( LUBFR, R8ARR, MXMN, MXLV, NLV, CMNSTR )
CALL UFBSEQ ( LUBFR, R8ARR, MXMN, MXLV, NLV, CMNSTR )
Input arguments:
LUBFR INTEGER Logical unit for BUFR file
CMNSTR CHAR*(*) String of blank-separated mnemonics
associated with R8ARR
MXMN INTEGER Size of first dimension of R8ARR
MXLV INTEGER Size of second dimension of R8ARR
OR number of levels of data values
to be written to data subset
Input or output argument (depending on context of LUBFR):
R8ARR(*,*) REAL*8 Data values written/read to/from
data subset
Output argument:
NLV INTEGER Number of levels of data values
written/read to/from data subset
All three of these routines are similar, but there are some important
distinctions. We'll focus first on the similarities, the most significant
of which is basic functionality, in that each routine writes or reads
specified values to or from the current BUFR data subset within the
internal arrays, with the direction of the data transfer being determined by
the context of LUBFR, i.e. if LUBFR points to a
BUFR file that is open for input, then data values are read from the internal
data subset; otherwise, data values are written to the internal data subset.
The actual data transfer occurs through the use of the two-dimensional
REAL*8 array R8ARR, which must be declared and dimensioned
by the user within his or her application program, and whose actual first dimension
MXMN must always be passed in as a call argument to each of the
above routines. The call argument MXLV, on the other hand,
contains the actual second dimension of R8ARR only when
LUBFR points to a BUFR file that is open for input (i.e.
reading/decoding BUFR); otherwise, whenever LUBFR points to
a BUFR file that is open for output (i.e. writing/encoding BUFR), MXLV
instead contains the actual number of levels of data values that are to
be written to the data subset (and where this number obviously must be less
than or equal to the actual second dimension of R8ARR!).
In either case, the input character string CMNSTR always contains
a blank-separated list of "mnemonics"
(see Description and Format of DX BUFR Tables)
which correspond to the REAL*8 values contained within the first dimension of
R8ARR, and the output argument NLV always
denotes the actual number of levels of those values that were written/read to/from
the second dimension of R8ARR, where each such level represents
a repetition of the mnemonics within CMNSTR. Note that, when
LUBFR points to a BUFR file that is open for output (i.e.
writing/encoding BUFR), we would certainly expect that the output value NLV
is equal to the value of MXLV that was input, and indeed this is
the case unless some type of error occurred in storing one or more of the data levels.
At this point we should mention that, except in the case of subroutine UFBSEQ,
the correspondence between CMNSTR and the REAL*8 values listed
within the first dimension of R8ARR is one-to-one, meaning
that the mnemonics listed within CMNSTR are Table B mnemonics
and correspond positionally to the values in the first dimension of R8ARR.
UFBSEQ, on the other hand, may contain a Table A or Table D
sequence mnemonic within CMNSTR, in which case the values in
R8ARR then correspond to the sequence of Table B mnemonics
which constitute that Table A or Table D mnemonic.
One important thing to note about all three of the above subroutines is that
all data transfer is done via the use of the REAL*8 array R8ARR.
Therefore, any data that are desired to be encoded into BUFR as character values
(or, more officially, "CCITT IA5", which is basically just a fancy name for
ASCII) must be converted from character into REAL*8 within the application
program before storing such values into array R8ARR.
Conversely, when LUBFR points to an input file, any data values
read from R8ARR which correspond to character data must be
converted by the application program from REAL*8 back into character format.
In either direction, the conversion between REAL*8 and character (i.e. CCITT IA5)
values is most easily accomplished in FORTRAN via an EQUIVALENCE between an array
of each type, as shown within some of the example programs provided with the
BUFRLIB documentation.
Another important thing to note is that all numeric (i.e. non-character) data
values within R8ARR are in the exact units specified for the
corresponding mnemonic within the appropriate BUFR table, without any scale or
reference values applied. Specifically, this means that, when writing/encoding
data values into a BUFR subset, the user needs only to store each respective value
into R8ARR using the units specified within the BUFR table, and the BUFRLIB
software itself will take care of any necessary scaling or referencing of the
value before it is actually stored within the subset. Conversely, when
reading data values from a BUFR input subset, the values returned in R8ARR
are already de-scaled and de-referenced and, thus, are already in the exact units
that were defined for the corresponding mnemonics within the relevant BUFR table.
However, when a returned data value within R8ARR contains the value
10.0E10 (= 10.0 X 10**10 = 1.0 X 10**11 ), this indicates that the value for the
corresponding mnemonic was "missing" (i.e. all bits set to 1) within the BUFR subset.
Now that we've covered the similarities between the above three subroutines,
let's now take note of the differences, which relate mainly to the situational
context within which each one may be used. Specifically, UFBINT is
used for writing/reading data values corresponding to mnemonics which are part
of a delayed-replication sequence, or for which there is no replication at all.
As such, it is the most commonly-used of the three subroutines and is sufficient
in-and-of-itself for many basic applications. UFBREP, on the other
hand, must be used for mnemonics which are part of a regular (i.e. non-delayed)
replication sequence or for those which are replicated via being directly listed
more than once within an overall subset definition rather than by being included
within a replication sequence. For example, consider the following cases, where
the notation used is formally explained within
Description and Format of DX BUFR Tables
but will be covered in enough detail here in order to illustrate the concept:
To begin with, suppose that the BUFR tables file for a particular type of data
contains the following definitions:
| WHTSEQ | 303011 | WINDS-BY-HEIGHT SEQUENCE |
| GPOT | 007003 | GEOPOTENTIAL |
| VSIG | 008001 | VERTICAL SOUNDING SIGNIFICANCE |
| WDIR | 011001 | WIND DIRECTION |
| WSPD | 011002 | WIND SPEED |
| WHTSEQ | GPOT VSIG WDIR WSPD |
The above defines a Table D sequence mnemonic with name "WHTSEQ"
and which is composed of the Table B mnemonics "GPOT",
"VSIG", "WDIR", and "WSPD" (in that order!).
One can imagine such a sequence being utilized, for example, in the
representation of winds-by-height within the subset definition for a
rawinsonde report, where these four Table B mnemonics are all replicated at some
number of sounding levels within the atmosphere above a particular reporting
site. In determining which of the above three values-level subroutines to use,
the key is in seeing how exactly the replication is defined within the actual
subset definition! For example, if the subset definition contained:
{WHTSEQ}
then this would indicate that the replication is being done via 8-bit delayed
replication, in which case we could, assuming that we were reading/decoding
such data from an input BUFR file, use subroutine UFBINT as follows:
CALL UFBINT ( LUBFR, R8ARR, MXMN, MXLV, NLV, 'GPOT VSIG WDIR WSPD' )
and in which case the return value NLV would indicate how
many such sounding levels were available, and these would themselves be
returned one per row within the array R8ARR, while the
first four columns of R8ARR would themselves correspond
to the values of the four Table B mnemonics, respectively, at each sounding level!
Suppose, on the other hand, that the winds-by-height had been included in the
subset definition as:
"WHTSEQ"100
which would indicate that the replication is being done via regular (i.e. non-delayed)
replication with a fixed replication factor of 100. In that case, we would have to use:
CALL UFBREP ( LUBFR, R8ARR, MXMN, MXLV, NLV, 'GPOT VSIG WDIR WSPD' )
in order to read the data, and in this case we would always get a return value of
NLV = 100, even if there were not 100 actual sounding levels
worth of data available for this particular reporting site (but in which case the
unavailable levels would be filled out with the aforementioned "missing" value of 10.0E10).
Now, suppose further that, at a later point in this subset definition for a
rawinsonde report, the mnemonic "GPOT" was re-used in
order to redefine the geopotential level to that of, say, the lowest cloud seen,
followed by that of the highest cloud seen. This might look like:
{WHTSEQ} ... GPOT (low_cloud_information) ... GPOT (high_cloud_information) ...
where ... represents any collection of zero or more intervening
mnemonics. In this example, we are back to using delayed replication for the
winds-by-height sequence itself; however, we now have the mnemonic "GPOT"
being further replicated by being directly listed outside of a replication sequence.
Therefore, the use of UFBREP is also required in this case in
order to be able to retrieve all of the "GPOT" values
within the data subset. The return value NLV would still give a
count of the total number of rows that were filled within R8ARR,
but this count would now be two higher than it was previously, and the two
additional "GPOT" values (i.e. for the low cloud
and high cloud information, respectively) would be returned within the last
two rows of R8ARR, since that is the order in which they were
listed with respect to the other "GPOT" values
(i.e. the ones occurring as part of the replication of "WHTSEQ") within
the overall subset definition.
As for UFBSEQ, we have already touched on the use of this
subroutine, but we will do a quick review here by noting that we could have replaced:
CALL UFBINT ( LUBFR, R8ARR, MXMN, MXLV, NLV, 'GPOT VSIG WDIR WSPD' )
with:
CALL UFBSEQ ( LUBFR, R8ARR, MXMN, MXLV, NLV, 'WHTSEQ' )
in the previous example and gotten the exact same output in return, since UFBSEQ
will itself determine which Table B mnemonics constitute the Table D sequence
mnemonic "WHTSEQ" and then return all of the corresponding
values within separate columns of R8ARR! Of course, in either case
(and in the case of UFBREP as well!), the user must be certain that
R8ARR is dimensioned large enough within his or her application
program in order to be able to hold all of the values that can possibly be
returned.
|