CSV was introduced as a transmission format for data only as an addition to SDMX 2.1 and was subsequently updated with the release of SDMX 3.0.
Like JSON, there are two distinct versions of the data message:
In this unit we’ll look at the general principles of the SDMX CSV data messages and note the differences between the two versions.
The SDMX-CSV v1 data message specification was added to SDMX 2.1 to provide a more convenient way to work with SDMX datasets using standard software tools like Excel, ‘R’, Tableau, Power BI and other business intelligence and statistics packages.
The CSV follows a standard 2-dimensional table layout with fixed columns and one row per observation.
There must be one column for the dataflow, one column per dimension, one column for the primary measure and one column per attribute. All dimensions defined in the Dataflow’s Data Structure Definition (DSD) must be included.
A header row defines the meaning of each column.
In the simple case the value of each component is just the Id:
DATAFLOW, REF_AREA, INDICATOR, SUB_INDICATOR, FREQ, TIME_PERIOD, OBS_VALUE
WB:GCI(1.0), GHA, GCI, RANK, A, 2008, 102
WB:GCI(1.0), GHA, GCI, RANK, A, 2009, 114
WB:GCI(1.0), GHA, GCI, RANK, A, 2010, 114
WB:GCI(1.0), GHA, GCI, RANK, A, 2011, 114
WB:GCI(1.0), GHA, GCI, RANK, A, 2012, 103
WB:GCI(1.0), GHA, GCI, RANK, A, 2013, 114
WB:GCI(1.0), GHA, GCI, RANK, A, 2014, 111
Component Names (sometimes called Labels) can also be included by encoding each value as Id : Name
as follows:
DATAFLOW, REF_AREA:Reference Area, INDICATOR:Indicator, SUB_INDICATOR:Sub Indicator, FREQ:Frequency, TIME_PERIOD:Time period, OBS_VALUE:Observation
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2008, 102
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2009, 114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2010, 114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2011, 114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2012, 103
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2013, 114
WB:GCI(1.0): Global Competitiveness Index, GHA: Ghana, GCI: Global Competitiveness Index, RANK: Rank, A: Annual, 2014, 111
The SDMX-CSV v2 data message follows the same general principle of one observation per row.
V2 was introduced with SDMX 3.0 and includes support for the additional features added to that version of the standard, in particular:
Datasets are also no longer restricted to just Dataflows and can instead reference a Dataflow, Data Structure Definition or a Provision Agreement. SDMX-CSV v2 supports this by changing the first two columns to describe the structure type (STRUCTURE : dataflow, datastructure or dataprovision) and its Id (STRUCTURE_ID)
In addition, the third column now specifies the action type (ACTION) bringing CSV into line with the XML and JSON formats that already provide for this, and better supporting the data exchange use case. The SDMX technical specifications define four action types which are encoded in a v2 CSV message using single characters:
I
- Information - the data is intended for informational purposes only, but treated as Append by the receiverA
- Append - instructs incremental update of existing observations by the receivedR
- Replace - instructs the receiver that existing observations should be replaced or appended if they don’t already existD
- Delete - instructs the receiver to delete the referenced dataThere can be a mix of action types within a single data message.
The following simple example message has three observations for the dataflow ESTAT:CENS_91SMSTA(1.0)
with an action type of I
indicating that the data is for information purposes only. Note that OBS_STATUS
is an optional observation level attribute.
STRUCTURE,STRUCTURE_ID,ACTION,FREQ,MARSTA,AGE,SEX,UNIT,GEO,TIME_PERIOD,OBS_VALUE,TIME_FORMAT,OBS_STATUS
dataflow,ESTAT:CENS_91SMSTA(1.0),I,A,TOTAL,Y25-29,F,PER,FR,1991,2162398,P1Y,
dataflow,ESTAT:CENS_91SMSTA(1.0),I,A,TOTAL,Y25-29,M,PER,FR,1991,2147582,P1Y,
dataflow,ESTAT:CENS_91SMSTA(1.0),I,A,TOTAL,Y25-29,T,PER,FR,1991,4309980,P1Y,
Values that contain commas must be enclosed in double-quotes
Some implementations support delimiters other than comma, semi-colon (;
) for instance.
In the next unit we’ll look at how to interactively convert SDMX datasets between the various data transmission formats using the FMR web user interface