CSV imports and mapping to RDF using SPARQL CONSTRUCT

A data import is a combination of 3 resources:

An uploaded file holding the data to be converted to RDF and imported, such as CSV or XML file
The converter that produces RDF. In case of CSV it is SPARQL CONSTRUCT query.
Target container
The container to which converted items will be POSTed, skolemized against, and become its children

The import process runs in the background, i.e. the import item is created before the process completes. Currently the only way to determine when it completes is to refresh the import item and check the import status (completed/failed). Upon successful report, metadata such as the number of imported RDF triples is attached to the import.

The converted RDF is validated against constraints before import. Constraint violations, if any, are attached to the import item.

Import CSV

CSV is a plain-text format for tabular data. CSV import in LinkedDataHub consists of 2 steps:

  1. generic conversion
  2. vocabulary conversion

We provide an running example of CSV data that will be shown as RDF conversion in the following sections:

AE,23.4,53.8,"United Arab Emirates"

Generic conversion

The data table is converted to a graph by treating rows as resources, columns as predicates, and cells as xsd:string literals. The approach is the same as CSV on the Web minimal mode.

@base <https://linkeddatahub.com/demo/city-graph/> .

  <#countryCode> "AD" ;
  <#latitude> "42.5" ;
  <#longitude> "1.6" ;
  <#name> "Andorra" .

  <#countryCode> "AE" ;
  <#latitude> "23.4" ;
  <#longitude> "53.8" ;
  <#name> "United Arab Emirates" .

  <#countryCode> "AF" ;
  <#latitude> "33.9" ;
  <#longitude> "67.7" ;
  <#name> "Afghanistan" .

Vocabulary conversion

This step provides a semantic "lift" for the generic RDF output of the previous step by mapping it to classes and properties from specific vocabularies. The mapping is a user-defined SPARQL CONSTRUCT query which transforms one row at a time. In this case we produce a SKOS concept paired with its item (document) for each country:

PREFIX  def:  <ns/default#>
PREFIX  ns:   <ns#>
PREFIX  apl:  <https://w3id.org/atomgraph/linkeddatahub/domain#>
PREFIX  geo:  <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX  dh:   <https://www.w3.org/ns/ldt/document-hierarchy/domain#>
PREFIX  dct:  <http://purl.org/dc/terms/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  sioc: <http://rdfs.org/sioc/ns#>

    ?item a def:Item ;
        sioc:has_container ?this ;
        dct:title ?name ;
        dh:slug ?countryCode ;
        foaf:primaryTopic ?country .
    ?country a ns:Country ;
        foaf:isPrimaryTopicOf ?item ;
        dct:identifier ?countryCode ;
        geo:lat ?lat ;
        geo:long ?long ;
        dct:title ?name .
    BIND(bnode() AS ?item)
    ?country  <#countryCode>  ?countryCode ;
              <#latitude>     ?latString ;
              <#longitude>    ?longString ;
              <#name>         ?name
    BIND(xsd:float(?latString) AS ?lat)
    BIND(xsd:float(?longString) AS ?long)

These are the rules that hold for mapping queries:

  • BASE is the request URI, predicate URIs are relative to it
  • produce items (documents) and pair them with topic resources using foaf:primaryTopic/foaf:isPrimaryTopicOf properties
  • the result resources should be blank nodes and not URIs, as they will be skolemized depending on their RDF types
  • ?base binding is provided with the base URI of the current application
  • useOPTIONAL for optional cell values
  • use BIND() to introduce new values and/or cast literals to the appropriate result datatype or URI

We are planning to provide a UI-based mapping tool in the future.

The result of our mapping using above data and query using binding (?base, <https://linkeddatahub.com/demo/city-graph/>) (only the first resource is shown):

_:item a <https://linkeddatahub.com/demo/city-graph/ns/default#Item> ;
    dct:title "Andorra" ;
    dh:slug "AD" ;
    foaf:primaryTopic _:country .

_:country a <https://linkeddatahub.com/demo/city-graph/ns#Country> ;
    foaf:isPrimaryTopicOf _:item ;
    dct:identifier "AD" ;
    geo:lat 42.5 ;
    geo:long 1.6 ;
    dct:title "Andorra" .

If you are ready to import some CSV, see our step-by-step tutorial on creating a CSV import.