Skip to content

Data Catalog Records

Each Data Provider maintains a set of one or more metadata files, each of which can describe one or more distinct sources of data. These descriptions serve several purposes:

  1. They drive discovery descriptions are ingested into our search system and made available to a Data Consumer searching for particular kinds of data.
  2. They inform consumption of that data, providing information on:
    1. The API required to access the data source
    2. Any access constraints which may need to be satisfied
    3. Licenses for any accessed data
    4. Representation and internal semantics of expressions of the data

Note: This document uses US English. To align with W3C and other prevalent standards, IB1 uses US English in its technical specifications and technical documentation.

RDF Prefixes

This specification uses the following prefixes:

  dcat:    http://www.w3.org/ns/dcat#
  dcterms: http://purl.org/dc/terms/
  ib1:     https://registry.trust.ib1.org/ns/1.0#

Dataset or Data Service?

A Dataset:

  • is provided as one or more downloadable files,
  • may be published as part of series of Datasets covering the same source of data over different time periods, and
  • should maintain historical access to previous periods.

A Data Service:

  • is an API to query some data which uses parameters to specify a subset of data, including time period,
  • is specified formally by a machine readable API description, and
  • may require consent from a data subject external to the Trust Framework.

They are described by slightly different information in metadata files.

Scheme-conforming

A Scheme-conforming Dataset or Data Service meets the data format and meaning requirements of the Scheme, along with any required access and license conditions.

These requirements are published by the Scheme Registry as machine readable Scheme Catalog Requirements Documents, and metadata files link to them to show their conformance.

Most Datasets and Data Services are Scheme-conforming. A Data Provider may publish data which is not Scheme-conforming to:

  • use Scheme licenses and roles to share ad-hoc Shared Data with Scheme participants (where the Scheme doesn't expressly disallow this), or
  • use the Catalog to include Open Data in a public index.

Metadata File Structure

The metadata is a standard DCAT RDF file representing one or more sources of data.

NOTE: The examples below use the Turtle format for compactness and increased readability. Data providers may present this information in Turtle, RDF/XML, JSON-LD or N3 formats.

Datasets are represented as Dataset DCAT objects with one or more Distributions. If the data measures the same thing over periods of time, then these must be linked together with a Data Series object. The format of the data is described by JSON Schema, XSD 1.1 or CSVW schemas.

Data Services are represented as Data Service DCAT objects, with OpenAPI specifications of the API and the format of the data in the responses.

The URL of the DCAT object inside the RDF representation is the stable identifier of the Dataset or Data Service. This must remain constant each time the metadata file is fetched and over updates to the metadata.

Mandatory metadata fields

The following fields must be included in every DCAT object. Metadata will be visible to all participants in the Trust Framework, and may be visible to anyone on the open web without authentication in an open index.

dcterms:title
Short title for this dataset.
dcterms:publisher
The URL of the Data Provider's record in the Scheme Directory.
dcterms:license
The URL of an ib1:License. All use of this data source is subject to this License. Where a data source is Scheme-conforming, the URL will be registered in the Registry.
ib1:trustFramework
The URL of the Trust Framework the dataset is published within.
ib1:scheme
The URL of the Scheme the dataset is published within, which must be a Scheme within the specified ib1:trustFramework.
ib1:datasetAssurance
The assurance level for this dataset.
ib1:sensitivityClass
The URL describing the sensitivity of the data, which may be a generic sensitivity class defined by the IB1 RDF Schema, if the Generic Sensitivity Classes specification is used by the Scheme, or a Registry URL otherwise. (Previous versions of this standard used literal strings to identify the classes.) If the sensitivity class definition requires the use of FAPI, a FAPI compliant API must be used to serve the dataset. If the sensitivity class definition is marked as requiring the end user's Permission (consent), the ib1:oauthIssuer term must be present.

More information about publishing assured data within a Trust Framework is available on the How to become an assured data publisher section of the Icebreaker One website.

Additional fields may be made mandatory for Scheme-conforming data sources by the Scheme Catalog Requirements Document.

Conformance metadata fields

dcterms:conformsTo
The URL of a Scheme Catalog Requirements Document in the Scheme Registry. Most metadata files will include this field.

Access control metadata fields

The catalog entries must specify the rules for accessing the dataset. This specification does not specify a method of access control. Each Scheme will choose a method of access control and declare it in the Scheme definition in the Registry, for example, the standard Role-based Access Control specification.

Data Service metadata fields

Data Services are represented by dcat:DataService objects with the common mandatory fields and Data Service specific fields.

dcat:endpointDescription
The URL of an OpenAPI file, which fully documents the request parameters and responses. Responses must use XML or JSON. To allow the OpenAPI file to be used by multiple Data Providers, the file may only contain a single Server object, where the url is "{endpointURL}", and variables sets the default to "https://endpointurl-not-specified.ib1.org".
dcat:endpointURL
The URL of this specific instance of the API. It is interpolated into the url specified in the OpenAPI file using the endpointURL variable.
ib1:oauthIssuer
Where access to data requires end user consent or selection of an account at the provider, the URL of the OAuth Issuer which is used to authenticate before accessing this Data Service. This field is required for data with a ib1:sensitivityClass which is marked as requiring the end user's Permission (consent), and may be used for other classes.
ib1:heartbeatDescription
An optional URL of an OpenAPI file (with Server specified as dcat:endpointDescription), which contains a single Path with a 200 response defined. This term will typically be the URL of one of a small number of standard OpenAPI files published in the Registry.

Any additional metadata defined by published Standards may be added.

Dataset metadata fields

Datasets are represented by dcat:Dataset objects with the common mandatory fields and Dataset specific fields.

As Datasets will be discovered by browsing an index, they need additional descriptive metadata for discovery. The following fields are mandatory:

dcterms:description
Longer form description of this dataset. This is used in combination with the title and tags when people search for datasets, so aim to include probable search words in the description.
dcat:distribution
URL of a dcat:Distribution for a downloadable file, see below for mandatory fields. Multiple Distributions may be defined for the same data in different formats, taking into account any requirements and restrictions for Scheme-conforming datasets.

The following fields are mandatory when the dataset is part of a series of periodic datasets:

dcat:inSeries
The URL of a dcat:DataSeries which associates this Dataset with the overall series. The DataSeries is created by the publisher and contains their data only.

The following fields are optional:

dcat:version
Version number of the dataset, this should preferably follow semantic versioning if possible. Versioning of the dataset should be used to indicate changes in delivery mechanism, or in representation, rather than for changes in the underlying data. For example, this should not be used to differentiate between datasets from different years, rather it should be used to indicate whether a potential data consumer might need to alter how it processes any returned data.
dcat:versionNotes
Notes used to explain any changes to this version.

Any additional metadata defined by published Standards may be added.

Distribution metadata fields

To specify how the data may be downloaded, one or more associated dcat:Distribution objects must be included which contain:

dcat:downloadURL
A stable URL for download of the dataset, subject to access controls specified in the Dataset object. Liveness of the server will be tested by making a HEAD request to this URL.
dcat:media_type
The MIME type of the download file.

The following fields are optional, but encouraged. They are mandatory for higher assurance and Scheme-conforming data sources.

ib1:dataSchema
The URL of a schema file specifying the format of the downloadable file. The type of schema depends on the dcat:mediaType: application/json are documented by JSON Schema files, application/xml by XSD 1.1 files, and text/csv by CSVW files.

Additional metadata for Datasets and Data Services

The fields marked as mandatory are the minimum needed to ensure that a data source can be used by the Trust Framework participants, and is visible in the Open Net Zero search system. There are, however, other properties of a dataset which may be useful to potential data consumers. Where such information can be provided, it should be provided in as standard a form as possible - in practice this translates to making use of existing ontologies such as DCAT and Dublin Core by preference, then shared, industry-specific, ontologies, and only using internal or custom representation when absolutely necessary.

Of particular note, and something we would like to ultimately expose in the Open Net Zero search interface, is information about the geospatial and temporal ranges of entries within a dataset. This is a complex subject, but one that has already been handled by DCAT. If you need to express this kind of information, please do so according to the standards laid out here.

We encourage use of the dcat:keyword list for datasets. These translate to “tags” in Open Net Zero's web interface and are useful to group datasets around specific topics.

Scheme Catalog Requirements

A Scheme-conforming data source meets a common standard across a Scheme. These standards are followed by the Data Providers so that the same kind of data is published across the Scheme using the same APIs, formats and meaning of data. A Scheme Catalog Requirements Document specifies the metadata that a DCAT Catalog entry must contain for it to be Scheme-conforming, and which roles may publish a conforming data source.

It is an RDF document published in the Registry, and the metadata links to machine readable License, API and format specifications which are also published in the Registry.

Example

@prefix dcat: <http://www.w3.org/ns/dcat#> . 
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .

<https://registry.core.trust.ib1.org/scheme/electricity/standard/supply-voltage>
        a ib1:SchemeCatalogRequirements ;
    dcterms:title "Supply Voltage API Requirements" ;
    ib1:requiredType dcat:DataService ;
    ib1:roleRequiredToPublish <https://registry.core.trust.ib1.org/scheme/electricity/role/generator> ;
    ib1:requiredMetadata [ a ib1:RequiredMetadata ;
        dcat:endpointDescription <https://registry.core.trust.ib1.org/scheme/electricity/api/voltage> ;
        ib1:heartbeatDescription <https://registry.core.trust.ib1.org/api/heartbeat-simple/1.0> ;
        ib1:sensitivityClass ib1:IB1-SA ;
        ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
        ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
        dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/voltage-reporting/1.4> ;
    ];
    ib1:requireAllAndAllowAdditional ib1:roleRequiredToAccess ;
    ib1:requireAbsenceOf ib1:oauthIssuer ;
.

This example defines a standard Supply Voltage API that is provided by multiple providers in a Trust Framework. It specifies the API in detail with the dcat:endpointDescription referring to an OpenAPI specification hosted by the Registry. It uses a standard ib1:heartbeatDescription to check for liveness, using a standard heartbeat request defined in an OpenAPI specification hosted by the Registry.

It sets the ib1:sensitivityClass to shared data. Because the API does not require end user consent, ib1:requireAbsenceOf is used to prohibit the use of an OAuth Issuer.

It uses the standard Role-based Access Control specification, setting the roles of who can use the API with ib1:roleRequiredToAccess. Because ib1:requireAllAndAllowAdditional is used for the access rules, it allows the publisher to widen access to additional roles, as long as the roles in this document are included. ib1:roleRequiredToPublish specifies which roles can publish implementations of this API for discovery in the data catalog.

Object specification

An ib1:SchemeCatalogRequirements RDF object must contain these fields:

ib1:requiredType
The type of the DCAT Catalog entry which describes this data source.
ib1:requiredMetadata
A bnode which contains the metadata required to be Scheme-conforming. This bnode may contain any fields and metadata, and a conforming catalogue entry must contain it all, subject to the term modifiers.

Access control rules must be defined for publishing conforming datasets and for accessing those datasets, for example, with the standard Role-based Access Control specification.

Term modifiers

The requirements for terms in the ib1:requiredMetadata are modified by terms in the top level object.

(no modifier)
All the values in the requirements must be included for a term which does not have a modifier. No additional values for that term are allowed.
ib1:requireAllAndAllowAdditional <term>
All the values in the requirements must be included for this term, but additional values are allowed.
ib1:requireAnyOneOf <term>
Exactly one of the values in the requirements must be included for this term. No other values are allowed.
ib1:requireAnyValue <term>
The term must be present, with any valid value.
ib1:requireAbsenceOf <term>
The term must not be present.

Full Example

Data Service

@prefix dcat: <http://www.w3.org/ns/dcat#> . 
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .

<https://example.com/supply-voltage/v0>
    a dcat:DataService ;
    dcterms:title "Electricity Generation Voltage"@en ;
    dcterms:description "API to query generation supply voltage"@en ;
    dcterms:publisher <https://directory.core.trust.ib1.org/member/827252> ;
    dcterms:conformsTo <https://registry.core.trust.ib1.org/scheme/electricity/standard/supply-voltage> ; 
    dcat:endpointDescription <https://registry.core.trust.ib1.org/scheme/electricity/api/voltage> ;
    ib1:heartbeatDescription <https://registry.core.trust.ib1.org/api/heartbeat-simple/1.0> ;
    dcat:endpointURL <https://grid03.api.example.com/generation-voltage/v0> ;
    ib1:trustFramework <https://registry.core.trust.ib1.org/trust-framework> ;
    ib1:scheme <https://registry.core.trust.ib1.org/scheme/electricty> ;
    ib1:datasetAssurance ib1:AssuranceLevel1 ;
    ib1:sensitivityClass ib1:IB1-SA ;
    ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
    ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
    dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/voltage-reporting/1.4> ;
.

Dataset with Distributions and Data Series

@prefix dcat: <http://www.w3.org/ns/dcat#> . 
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ib1: <https://registry.core.trust.ib1.org/ns/1.0#> .

<https://data.example.com/generation-report/oct2024>
    a dcat:Dataset ;
    dcterms:title "Generation Report Oct 2024"@en ;
    dcterms:description "Data report on generation"@en ;
    dcterms:publisher <https://directory.core.trust.ib1.org/member/827252> ;
    dcterms:conformsTo <https://registry.core.trust.ib1.org/scheme/electricity/standard/generation-report> ; 
    dcat:version "0.1.2" ;
    dcat:inSeries <https://data.example.com/generation-report>;
    dcat:distribution <https://data.example.com/generation-report/oct2024/csv> ;
    dcat:keyword "solar"@en,
        "electricity"@en,
        "retrofit"@en ;
    ib1:trustFramework <https://registry.core.trust.ib1.org/trust-framework> ;
    ib1:scheme <https://registry.core.trust.ib1.org/scheme/electricty> ;
    ib1:datasetAssurance ib1:AssuranceLevel1 ;
    ib1:sensitivityClass ib1:IB1-SA ;
    ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/network-operator> ;
    ib1:roleRequiredToAccess <https://registry.core.trust.ib1.org/scheme/electricity/role/report-provider> ;
    dcterms:license <https://registry.core.trust.ib1.org/scheme/electricity/license/generation-reporting/2.1> ;
.

<https://data.example.com/generation-report/oct2024/download>
    a dcat:Distribution ;
    dcterms:description "CSV"@en ;
    dcat:downloadURL <https://data.example.com/generation-report/oct2024/csv> ;
    dcat:mediaType "text/csv"@en ;
    ib1:dataSchema <https://registry.core.trust.ib1.org/scheme/electricity/format/generation-report/2.0> ;
.

<https://data.example.com/generation-report>
    a dcat:DatasetSeries ;
    dcterms:title "Generation Reports from My Energy Company"@en ;
.