Data Catalog Records
Each Data Provider maintains a set of one or more metadata files, each of which can describe one or more distinct sources of data. These descriptions serve several purposes:
- They drive discovery descriptions are ingested into our search system and made available to a Data Consumer searching for particular kinds of data.
- They inform consumption of that data, providing information on:
- The API required to access the data source
- Any access constraints which may need to be satisfied
- Licenses for any accessed data
- Representation and internal semantics of expressions of the data
Note: This document uses US English. To align with W3C and other prevalent standards, IB1 uses US English in its technical specifications and technical documentation.
RDF Prefixes
This specification uses the following prefixes:
dcat: http://www.w3.org/ns/dcat#
dcterms: http://purl.org/dc/terms/
ib1: https://registry.trust.ib1.org/ns/1.0#
Dataset or Data Service?
A Dataset:
- is provided as one or more downloadable files,
- may be published as part of series of Datasets covering the same source of data over different time periods, and
- should maintain historical access to previous periods.
A Data Service:
- is an API to query some data which uses parameters to specify a subset of data, including time period,
- is specified formally by a machine readable API description, and
- may require consent from a data subject external to the Trust Framework.
They are described by slightly different information in metadata files.
Scheme-conforming
A Scheme-conforming Dataset or Data Service meets the data format and meaning requirements of the Scheme, along with any required access and license conditions.
These requirements are published by the Scheme Registry as machine readable Scheme Catalog Requirements Documents, and metadata files link to them to show their conformance.
Most Datasets and Data Services are Scheme-conforming. A Data Provider may publish data which is not Scheme-conforming to:
- use Scheme licenses and roles to share ad-hoc Shared Data with Scheme participants (where the Scheme doesn't expressly disallow this), or
- use the Catalog to include Open Data in a public index.
Metadata File Structure
The metadata is a standard DCAT RDF file representing one or more sources of data.
NOTE: The examples below use the Turtle format for compactness and increased readability. Data providers may present this information in Turtle, RDF/XML, JSON-LD or N3 formats.
Datasets are represented as Dataset DCAT objects with one or more Distributions. If the data measures the same thing over periods of time, then these must be linked together with a Data Series object. The format of the data is described by JSON Schema, XSD 1.1 or CSVW schemas.
Data Services are represented as Data Service DCAT objects, with OpenAPI specifications of the API and the format of the data in the responses.
The URL of the DCAT object inside the RDF representation is the stable identifier of the Dataset or Data Service. This must remain constant each time the metadata file is fetched and over updates to the metadata.
Mandatory metadata fields
The following fields must be included in every DCAT object. Metadata will be visible to all participants in the Trust Framework, and may be visible to anyone on the open web without authentication in an open index.
- dcterms:title
- Short title for this dataset.
- dcterms:publisher
- The URL of the Data Provider's record in the Scheme Directory.
- dcterms:license
- The URL of an
ib1:License
. All use of this data source is subject to this License. Where a data source is Scheme-conforming, the URL will be registered in the Registry. ib1:trustFramework
- The URL of the Trust Framework the dataset is published within.
ib1:scheme
- The URL of the Scheme the dataset is published within, which must be a Scheme within the specified
ib1:trustFramework
. ib1:datasetAssurance
- The assurance level for this dataset.
ib1:sensitivityClass
- The URL describing the sensitivity of the data, which may be a generic sensitivity class defined by the IB1 RDF Schema, if the Generic Sensitivity Classes specification is used by the Scheme, or a Registry URL otherwise. (Previous versions of this standard used literal strings to identify the classes.)
If the sensitivity class definition requires the use of FAPI, a FAPI compliant API must be used to serve the dataset.
If the sensitivity class definition is marked as requiring the end user's Permission (consent), the
ib1:oauthIssuer
term must be present.
More information about publishing assured data within a Trust Framework is available on the How to become an assured data publisher section of the Icebreaker One website.
Additional fields may be made mandatory for Scheme-conforming data sources by the Scheme Catalog Requirements Document.
Conformance metadata fields
dcterms:conformsTo
- The URL of a Scheme Catalog Requirements Document in the Scheme Registry. Most metadata files will include this field.
Access control metadata fields
The catalog entries must specify the rules for accessing the dataset. This specification does not specify a method of access control. Each Scheme will choose a method of access control and declare it in the Scheme definition in the Registry, for example, the standard Role-based Access Control specification.
Data Service metadata fields
Data Services are represented by dcat:DataService
objects with the common mandatory fields and Data Service specific fields.
dcat:endpointDescription
- The URL of an OpenAPI file, which fully documents the request parameters and responses. Responses must use XML or JSON. To allow the OpenAPI file to be used by multiple Data Providers, the file may only contain a single Server object, where the
url
is"{endpointURL}"
, andvariables
sets the default to"https://endpointurl-not-specified.ib1.org"
. dcat:endpointURL
- The URL of this specific instance of the API. It is interpolated into the
url
specified in the OpenAPI file using theendpointURL
variable. ib1:oauthIssuer
- Where access to data requires end user consent or selection of an account at the provider, the URL of the OAuth Issuer which is used to authenticate before accessing this Data Service. This field is required for data with a
ib1:sensitivityClass
which is marked as requiring the end user's Permission (consent), and may be used for other classes. ib1:heartbeatDescription
- An optional URL of an OpenAPI file (with Server specified as
dcat:endpointDescription
), which contains a single Path with a 200 response defined. This term will typically be the URL of one of a small number of standard OpenAPI files published in the Registry.
Any additional metadata defined by published Standards may be added.
Dataset metadata fields
Datasets are represented by dcat:Dataset
objects with the common mandatory fields and Dataset specific fields.
As Datasets will be discovered by browsing an index, they need additional descriptive metadata for discovery. The following fields are mandatory:
- dcterms:description
- Longer form description of this dataset. This is used in combination with the title and tags when people search for datasets, so aim to include probable search words in the description.
- dcat:distribution
- URL of a
dcat:Distribution
for a downloadable file, see below for mandatory fields. Multiple Distributions may be defined for the same data in different formats, taking into account any requirements and restrictions for Scheme-conforming datasets.
The following fields are mandatory when the dataset is part of a series of periodic datasets:
- dcat:inSeries
- The URL of a
dcat:DataSeries
which associates this Dataset with the overall series. The DataSeries is created by the publisher and contains their data only.
The following fields are optional:
- dcat:version
- Version number of the dataset, this should preferably follow semantic versioning if possible. Versioning of the dataset should be used to indicate changes in delivery mechanism, or in representation, rather than for changes in the underlying data. For example, this should not be used to differentiate between datasets from different years, rather it should be used to indicate whether a potential data consumer might need to alter how it processes any returned data.
- dcat:versionNotes
- Notes used to explain any changes to this version.
Any additional metadata defined by published Standards may be added.
Distribution metadata fields
To specify how the data may be downloaded, one or more associated dcat:Distribution
objects must be included which contain:
- dcat:downloadURL
- A stable URL for download of the dataset, subject to access controls specified in the Dataset object. Liveness of the server will be tested by making a HEAD request to this URL.
- dcat:media_type
- The MIME type of the download file.
The following fields are optional, but encouraged. They are mandatory for higher assurance and Scheme-conforming data sources.
ib1:dataSchema
- The URL of a schema file specifying the format of the downloadable file. The type of schema depends on the
dcat:mediaType
:application/json
are documented by JSON Schema files,application/xml
by XSD 1.1 files, andtext/csv
by CSVW files.
Additional metadata for Datasets and Data Services
The fields marked as mandatory are the minimum needed to ensure that a data source can be used by the Trust Framework participants, and is visible in the Open Net Zero search system. There are, however, other properties of a dataset which may be useful to potential data consumers. Where such information can be provided, it should be provided in as standard a form as possible - in practice this translates to making use of existing ontologies such as DCAT and Dublin Core by preference, then shared, industry-specific, ontologies, and only using internal or custom representation when absolutely necessary.
Of particular note, and something we would like to ultimately expose in the Open Net Zero search interface, is information about the geospatial and temporal ranges of entries within a dataset. This is a complex subject, but one that has already been handled by DCAT. If you need to express this kind of information, please do so according to the standards laid out here.
We encourage use of the dcat:keyword
list for datasets. These translate to “tags” in Open Net Zero's web interface and are useful to group datasets around specific topics.
Scheme Catalog Requirements
A Scheme-conforming data source meets a common standard across a Scheme. These standards are followed by the Data Providers so that the same kind of data is published across the Scheme using the same APIs, formats and meaning of data. A Scheme Catalog Requirements Document specifies the metadata that a DCAT Catalog entry must contain for it to be Scheme-conforming, and which roles may publish a conforming data source.
It is an RDF document published in the Registry, and the metadata links to machine readable License, API and format specifications which are also published in the Registry.
Example
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .
<https://registry.core.trust.ib1.org/scheme/electricity/standard/supply-voltage>
a ib1:SchemeCatalogRequirements ;
dcterms:title "Supply Voltage API Requirements" ;
ib1:requiredType dcat:DataService ;
ib1:roleRequiredToPublish <https://registry.core.trust.ib1.org/scheme/electricity/role/generator> ;
ib1:requiredMetadata [ a ib1:RequiredMetadata ;
dcat:endpointDescription <https://registry.core.trust.ib1.org/scheme/electricity/api/voltage> ;
ib1:heartbeatDescription <https://registry.core.trust.ib1.org/api/heartbeat-simple/1.0> ;
ib1:sensitivityClass ib1:IB1-SA ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/voltage-reporting/1.4> ;
];
ib1:requireAllAndAllowAdditional ib1:roleRequiredToAccess ;
ib1:requireAbsenceOf ib1:oauthIssuer ;
.
This example defines a standard Supply Voltage API that is provided by multiple providers in a Trust Framework. It specifies the API in detail with the dcat:endpointDescription
referring to an OpenAPI specification hosted by the Registry. It uses a standard ib1:heartbeatDescription
to check for liveness, using a standard heartbeat request defined in an OpenAPI specification hosted by the Registry.
It sets the ib1:sensitivityClass
to shared data. Because the API does not require end user consent, ib1:requireAbsenceOf
is used to prohibit the use of an OAuth Issuer.
It uses the standard Role-based Access Control specification, setting the roles of who can use the API with ib1:roleRequiredToAccess
. Because ib1:requireAllAndAllowAdditional
is used for the access rules, it allows the publisher to widen access to additional roles, as long as the roles in this document are included. ib1:roleRequiredToPublish
specifies which roles can publish implementations of this API for discovery in the data catalog.
Object specification
An ib1:SchemeCatalogRequirements
RDF object must contain these fields:
ib1:requiredType
- The type of the DCAT Catalog entry which describes this data source.
ib1:requiredMetadata
- A bnode which contains the metadata required to be Scheme-conforming. This bnode may contain any fields and metadata, and a conforming catalogue entry must contain it all, subject to the term modifiers.
Access control rules must be defined for publishing conforming datasets and for accessing those datasets, for example, with the standard Role-based Access Control specification.
Term modifiers
The requirements for terms in the ib1:requiredMetadata
are modified by terms in the top level object.
- (no modifier)
- All the values in the requirements must be included for a term which does not have a modifier. No additional values for that term are allowed.
ib1:requireAllAndAllowAdditional <term>
- All the values in the requirements must be included for this term, but additional values are allowed.
ib1:requireAnyOneOf <term>
- Exactly one of the values in the requirements must be included for this term. No other values are allowed.
ib1:requireAnyValue <term>
- The term must be present, with any valid value.
ib1:requireAbsenceOf <term>
- The term must not be present.
Full Example
Data Service
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .
<https://example.com/supply-voltage/v0>
a dcat:DataService ;
dcterms:title "Electricity Generation Voltage"@en ;
dcterms:description "API to query generation supply voltage"@en ;
dcterms:publisher <https://directory.core.trust.ib1.org/member/827252> ;
dcterms:conformsTo <https://registry.core.trust.ib1.org/scheme/electricity/standard/supply-voltage> ;
dcat:endpointDescription <https://registry.core.trust.ib1.org/scheme/electricity/api/voltage> ;
ib1:heartbeatDescription <https://registry.core.trust.ib1.org/api/heartbeat-simple/1.0> ;
dcat:endpointURL <https://grid03.api.example.com/generation-voltage/v0> ;
ib1:trustFramework <https://registry.core.trust.ib1.org/trust-framework> ;
ib1:scheme <https://registry.core.trust.ib1.org/scheme/electricty> ;
ib1:datasetAssurance ib1:AssuranceLevel1 ;
ib1:sensitivityClass ib1:IB1-SA ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/voltage-reporting/1.4> ;
.
Dataset with Distributions and Data Series
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .
<https://data.example.com/generation-report/oct2024>
a dcat:Dataset ;
dcterms:title "Generation Report Oct 2024"@en ;
dcterms:description "Data report on generation"@en ;
dcterms:publisher <https://directory.core.trust.ib1.org/member/827252> ;
dcterms:conformsTo <https://registry.core.trust.ib1.org/scheme/electricity/standard/generation-report> ;
dcat:version "0.1.2" ;
dcat:inSeries <https://data.example.com/generation-report>;
dcat:distribution <https://data.example.com/generation-report/oct2024/csv> ;
dcat:keyword "solar"@en,
"electricity"@en,
"retrofit"@en ;
ib1:trustFramework <https://registry.core.trust.ib1.org/trust-framework> ;
ib1:scheme <https://registry.core.trust.ib1.org/scheme/electricty> ;
ib1:datasetAssurance ib1:AssuranceLevel1 ;
ib1:sensitivityClass ib1:IB1-SA ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/generation-reporting/2.1> ;
.
<https://data.example.com/generation-report/oct2024/download>
a dcat:Distribution ;
dcterms:description "CSV"@en ;
dcat:downloadURL <https://data.example.com/generation-report/oct2024/csv> ;
dcat:mediaType "text/csv"@en ;
ib1:dataSchema <https://registry.core.trust.ib1.org/scheme/electricity/format/generation-report/2.0> ;
.
<https://data.example.com/generation-report>
a dcat:DatasetSeries ;
dcterms:title "Generation Reports from My Energy Company"@en ;
.