RDS/WIP Domains: Proposal

A proposal for how to address and identify data in the RDS/WIP on the network.

Background

Triplestores, Datasets, Graphs and Models

The RDS/WIP is backed by a single triplestore, but this triplestore can be divided into multiple datasets. Each dataset has a default graph, plus other optional named graphs. Each graph is in essence a standalone RDF model that can be manipulated independently of other graphs. Similarly, datasets can be completely independent of one another, or can overlap by sharing graphs (models).

SPARQL Endpoints

Data in the RDS/WIP is exposed on the network by the use of SPARQL endpoints. A SPARQL endpoint generally maps to either a single dataset or a single model. Optionally, a SPARQL endpoint may allow access to multiple datasets by name. In the RDS/WIP usage though, its likely that a SPARQL endpoint will map to a single dataset with a default model.

Domain Names

SPARQL endpoints need not necessarily share the same domain name - this means that different models can be exposed on completely different URIs with no shared component (apart from the http protocol part).

Content

Some endpoints will contain highly compliant and standardized ISO 15926 content; others will contain compliant but non-standardized content; still others will contain P7L content loosely adapted to RDF; and finally some endpoints will contain content that bears no relationship to ISO 15926 at all.

Opportunity

This feature of the technology - the ability to use the same installation to publish data appearing to reside at many different URIs - gives IDS-ADI an opportunity to consider what sort of policies and approaches we want to take to publishing data and how we want to support the service.

Proposal

The proposal here outlines operational responsibility, copyright policy, identifier policy, reference policy and how all of these relate to endpoint domain naming, which is seen as the key "bargaining chip" of the service.

Operational Responsibility

Operation is currently split between DNV and NRX via their respective membership in the parent bodies of the IDS-ADI project: PCA and FIATECH. This arrangement makes the RDS/WIP squarely the responsibility of the IDS-ADI project as a whole. Responsibility for operations, similarly confers some rights in regard to these two organizations - they should be free to publish endpoints under their own domain names using the service.

Copyright Policy

IDS-ADI needs some formal right to present submitted data, without encroaching on the ownership of that data. Similarly, the public needs some formal right to use such presented data, without encroaching on IDS-ADI or the owner.

A simple mechanism to enable this is a so-called "open license" similar to that used by open source source software and open license documents. Simply speaking, these licenses allow the owner to retain the copyright, but provide a free license in perpetuity to anyone who wants to reproduce or alter the copyright materials, providing original attribution is retained.

Submitters retain copyright over their submissions, however, they do irrevocably release the content under an open license, such as the BSD 2 clause or something similar in broad intent. That is to say, they forever allow other people to use their definitions without constraint apart from attribution. Those attributions should be simply part of the RDF data.

Identifier Policy

Conceptually, it is desirable that an identifier resolves to a definition about the identified thing. In RDF, the obvious way of achieving this is to store statements about the thing in an endpoint derivable from the identifier (a URI). So all endpoints should be identified by a URI that does not end in a slash; and all resources should be identified by a fragment URI relative to an endpoint URI. Moreover, all submitted resources are public, and therefore must use the HTTP address protocol. So public identifiers for IDS-ADI controlled RDS/WIP endpoints should have the following form:

<resource-uri> = <endpoint-uri>#<fragment-id>

<fragment-id> = XML NS Name

<endpoint-uri> = http://<domain>/<path>

<path> = <path-part> | <path>/<path-part>

<path-part> = HTTP URI Path Component Character NOT '/'

Caveats and Clarifications on Identifier Policy

Note that in relation ISO 15926 façades, identifiers are not so tightly restricted, and it is possible to have a SPARQL endpoint whose URI bears either no relation to the identifiers of the contained resources, or a different relationship, for example, using / rather than #.

This approach is flatly proscribed for IDS-ADI endpoints because it would create identifiers for which there is no deterministic means of extracting the endpoint without testing for it, in addition to exposing bugs in many RDF toolsets.

Nevertheless, the RDS/WIP will contain items with alternative identifiers that are:

  • not URIs,
  • URIs that are not resolveable at all,
  • URIs that are not resolveable without extra information,
  • URIs that are only resolveable with a live testing algorithm for the endpoint , and
  • URIs that do not meet IDS-ADI's own strictures.

These realities must be accomodated by all implementations.

Reference Policy

Generally, one endpoint should not copy a definition from another endpoint on the service, but should simply reference that definition. Any endpoint may provide alternate identifiers, including for definitions on another endpoint.

This makes provenance much easier to implement automatically - provenance otherwise becomes very difficult to police and maintain, and hence, will probably be handled poorly.

Endpoint Domains

Its reasonable to use different endpoints to differentiate between kinds of content and purposes of content. Endpoints can be differentiated by path, domain or both, but generally, domain is probably the wisest choice because it allows hosting to be changed easily. For example:

Domain TypeDescriptionExample
GeneralGeneral RDS/WIP domain namewip.rdlfacade.org
HostedSpecial purpose IDS-ADI sub-domain namecobie.rdlfacade.org
BrandedExternal organization's own domain namerds.posccaesar.org

Proposed Endpoints

For the main RDS/WIP purpose, that is, hosting ISO 15926 OWL/RDF definitions at SPARQL endpoints, the broadest (though not unanimous) agreement has been reached around these:

Domain NameContents in External OWLExample
dm.rdlfacade.orgPart 2 Definitionshttp://dm.rdlfacade.org/data#ClassOfProperty
rdl.rdlfacade.orgDefinitions of Classeshttp://rdl.rdlfacade.org/data#R1234567890
tpl.rdlfacade.orgDefinitions of Templateshttp://tpl.rdlfacade.org/data#R1234567890
oim.rdlfacade.orgDefinitions of OIM Propertieshttp://oim.rdlfacade.org/data#R1234567890
astm.rdlfacade.orgDefinitions from ASTMhttp://astm.rdlfacade.org/data#??????

Limitations and Use

The above proposed endpoints:

  • are not intended to satisfy SC4 requirements.
  • may be included in the informative part of standards if SC4 desires.
  • are intended to be permanently resolvable.
  • are intended to form enduring identifiers.
  • are intended to be simple to allocate
  • are not intended to be the only identifiers for the definitions they address
  • will include owl equivalence relations for all brutus IDs (uri to be selected by PCA)
  • will include owl equivalence relations for any SC4 IDs.
Home
About PCA
Reference Data Services (RDS)
RDS Operations Support
Meetings and Conferences
ISO 15926
Special Interest Groups
Technical Advisory Board
Norwegian Continental Shelf Std
Projects
Search