In this unit, you’ll be presented with a review of the essential SDMX concepts related to identifying and describing statistical data.
Review of essential SDMX terminology
The SDMX Information Model, which is used to describe statistics, consists of many concepts and artefacts. The majority of SDMX structural models can be described using a small subset of the Information Model. This subset will be the
focus of this introductory module.
Advanced courses and other learning resources will be made available to address specialised and more complex structural modelling requirements.
About SDMX and the SDMX Information Model
So, what is SDMX and the SDMX Information Model?
Here’s a quick refresher.
Statistical Data and Metadata eXchange (SDMX)
Does not introduce any new concepts for statisticians – it simply provides a framework for what statisticians already do.
Can be used to describe any multi-dimensional dataset regardless of statistical domain.
Provides a way to describe the structure of data.
SDMX Information Model
Forms the core of SDMX.
Describes statistics in a standard way.
Identifies objects and their relationships.
Allows central management and standard access.
Statistical data, metadata, and data exchange processes may be modelled.
SDMX concepts for data
A statistical observation is the actual data point or number being measured and can be described using three concepts: dimension, attribute, and measure.
A (statistical) series, the measurement of some phenomenon over time, is specified by a grouping of dimensions.
Select each concept to learn more.
Dimension
Dimension
A statistical concept used to both identify and describe the statistical observation.
Is necessary to understand the meaning of the data.
Taken together, the dimensions uniquely identify a statistical observation.
Is always mandatory.
Is always categorical.
The value set is always a codelist except in the case of the time dimension, which is a special case.
The value set is either ordered or unordered.
Is attached to a group/series, except for the time dimension and the measure dimension.
Time-series data Time dimension: attached to the observation. Measure dimension: attached to the group/series.
Cross-sectional data Time dimension: attached to the group/series. Measure dimension: attached to the observation.
Attribute
Attribute
A statistical concept used to describe a statistical observation. An attribute provides additional qualitative information about a statistical observation.
Is either coded or uncoded.
If coded, the value set for an attribute is either ordered or unordered.
Is either mandatory or optional.
Is attached to an observation, dimension, group/series, or dataset.
Note: Cell-level footnotes and dataset-level footnotes are defined in SDMX models as attributes. The same is true for observation status codes (confidentiality, provisional data, estimated data, etc.), this
concept is represented as an attribute and attached to observations.
Measure
Measure
Describes (quantifies) the actual values.
Is numeric.
Is either discrete or continuous.
The primary measure is usually referred to as “observation value”.
SDMX terminology and datasets
Dataset
A dataset is an organized collection of statistical observations.
Statistical observations are described using dimensions, attributes, and measure.
A statistical observation in a dataset is unambiguously identified by dimensions.
The dimensions, attributes and measure used to describe statistical observations in one dataset may be used to also describe statistical observations in other datasets.
In a statistical system:
There will be many datasets.
Datasets will be received from data providers, changed by internal processes, published in online portals, and reported to other organisations.
There will be many dimensions, attributes, and measures.
Some dimensions, attributes, and measures will be reused in multiple datasets.
There will be many codelists.
Some codelists will be reused in multiple dimensions and coded attributes. For example, sex, age, reference area, frequency, unit of measure.
Managing these structural metadata, maximizing coherence and consistency, both within an individual organisation as well as throughout the entire data lifecycle, including with external data partners, is an important aspect of
assuring quality in the statistical system.
There are numerous SDMX cross-domain codelists that are recommended for use to assist with these efforts.
What do you know?
Now that you’ve completed our review of Essential SDMX Concepts, try the three questions that follow.
A statistical observation can be described using three concepts: dimension, attribute, and measure.
Which of the following best describe a dimension?
Select all that apply and then select Submit.
That's right.
Dimension:
A statistical concept used to both identify and describe the statistical observation.
Is necessary to understand the meaning of the data.
Taken together, the dimensions uniquely identify a statistical observation.
Is always mandatory.
Is always categorical.
The value set is always a codelist except in the case of the time dimension, which is a special case.
The value set is either ordered or unordered.
Is attached to a group/series, except for time dimension and measure dimension.
That's incorrect. The correct answers are options 1, 2, 4 and 5.
Dimension:
A statistical concept used to both identify and describe the statistical observation.
Is necessary to understand the meaning of the data.
Taken together, the dimensions uniquely identify a statistical observation.
Is always mandatory.
Is always categorical.
The value set is always a codelist except in the case of the time dimension, which is a special case.
The value set is either ordered or unordered.
Is attached to a group/series, except for time dimension and measure dimension.
Not quite. The correct answers are options 1, 2, 4 and 5.
Dimension:
A statistical concept used to both identify and describe the statistical observation.
Is necessary to understand the meaning of the data.
Taken together, the dimensions uniquely identify a statistical observation.
Is always mandatory.
Is always categorical.
The value set is always a codelist except in the case of the time dimension, which is a special case.
The value set is either ordered or unordered.
Is attached to a group/series, except for time dimension and measure dimension.
What do you know?
Now see if you can identify which of the following best describe an attribute?
Select all that apply and then select Submit.
That's right.
Attribute:
Provides additional qualitative information about a statistical observation.
Is either coded or uncoded.
If coded, the value set for an attribute is either ordered or unordered.
Is either mandatory or optional.
Is attached to an observation, dimension, group/series, or dataset.
That's incorrect. The correct answers are options 1, 3 and 5.
Attribute:
Provides additional qualitative information about a statistical observation.
Is either coded or uncoded.
If coded, the value set for an attribute is either ordered or unordered.
Is either mandatory or optional.
Is attached to an observation, dimension, group/series, or dataset.
Not quite. The correct answers are options 1, 3 and 5.
Attribute:
Provides additional qualitative information about a statistical observation.
Is either coded or uncoded.
If coded, the value set for an attribute is either ordered or unordered.
Is either mandatory or optional.
Is attached to an observation, dimension, group/series, or dataset.
What do you know?
Finally, a measure is numeric and describes actual values, but how do we usually refer to the value of the primary measure?
Select your answer and then select Submit.
That's right.
Measure:
Describes (quantifies) the actual values.
Is numeric.
Is either discrete or continuous.
The primary measure is usually referred to as “observation value”.
Incorrect.
Measure:
Describes (quantifies) the actual values.
Is numeric.
Is either discrete or continuous.
The primary measure is usually referred to as “observation value”.
Coming next …
The SDMX concepts you covered in this opening unit provide a common vocabulary for statisticians to identify and describe statistical data.
These concepts will be used extensively throughout the remainder of this module, along with those for structural modelling covered in the next unit.