Data Loader Formats
Data loaders load data and metadata from different sources in a specified data format that conforms to the Input Data Model.
Loading Entities
File Type Tag: entity/list, entity/stream, entity/numeric, entity/vector…
Entities can be serialized in two different formats (JSON or CSV).
The names of attributes must be unique for all attributes from the same data loader.
Note
The plugin runner provides utilities to read and write the entities from file formats specified here.
The utilities can be found in the module qhana_plugin_runner.plugin_utils.entity_marshalling.
The examples here use the following entities:
ID |
href |
color |
paintA |
example.com/paints/paintA |
#8a2be2 |
paintB |
example.com/paints/paintA |
#e9322d |
See also
Entities (text/csv)
The first column must be the ID column (named ID).
If the entity has a href attribute then it must be the second column.
All other columns are entity attributes.
The CSV file must contain a header row with all attribute names. The attribute names (except href and ID) can then be used to lookup the attribute metadata.
Example:
ID,href,color
paintA,example.com/paints/paintA,#8a2be2
paintB,example.com/paints/paintA,#e9322d
Entities (application/json or application/X-lines+json)
Entites serialized as JSON do not need a specific order for their attributes. To support streaming parsing of files containing many entities the entites should be JSON objects with one object per line. Files with one JSON object per line should use the application/X-lines+json mimetype. Files using the application/json mimetype must only contain one valid JSON construct (e.g. a list or an object).
Example application/json:
[
{"ID": "paintA","href": "example.com/paints/paintA","color": "#8a2be2"},
{"ID": "paintB","href": "example.com/paints/paintB","color": "#e9322d"}
]
Example application/X-lines+json:
{"ID": "paintA","href": "example.com/paints/paintA","color": "#8a2be2"}
{"ID": "paintB","href": "example.com/paints/paintB","color": "#e9322d"}
See also
Attribute Metadata
File Type Tag: entity/attribute-metadata
Note
The plugin runner provides utilities to read attribute metadata and use it to serialize/de-serialize entity attributes.
The utilities can be found in the module qhana_plugin_runner.plugin_utils.attributes.
The attributes of entities (and relations) can be described by attribute metadata. The metadata of an attribute is expressed as an entity with the following attributes:
- ID
The name of the attribute (as used in the entity serializations)
- title
A human readable title for the attribute
- description
A human readable description of the attribute
- type
The type of the scalar values of the attribute (e.g. one of
null,boolean,integer,number,string,url,refor a user defined type)- multiple
Trueif the attribute contains more that one scalar value. (Default isFalse)- ordered
Trueif the order of the values is important. (Default isFalse)- separator
A character sequence that separates the scalar values. (only used in serialization formats that do not natively support lists for attributes e.g. csv)
- refTarget
A filename that contains the entities referenced in this attribute (when type is
ref). If empty all entites must be searched for in all available files.- schema
An URL to a schema that can be used to validate the scalar values. (e.g. a json schema)
For attributes with boolean values the following values are allowed (if the serialization does not support booleans natively)
value |
serialization |
|---|---|
true |
|
flase |
|
Case and surrounding whitespace must be ignored for boolean attributes.
A data loader should produce one file containing the attribute metadata for all attributes in the data source. Attributes already specified in Input Data Model may be omitted from the attribute metadata.
Attribute Metadata (text/csv)
The attribute metadata for the example entities:
ID,title,description,type
ID,Entity ID,the unique id of the entity,ref
href,Entity Link,link to the entity in the original data source,url
color,Color,the color of the paint,string
Graphs
File Type Tag: graph/*
Taxonomies and other entity structures that form a graph should be serialized as a graph. Formats like CSV are unsuitable to serialize graphs as they would need at least two files, one for the entities and one for the relations.
Graph (text/json)
{
"GRAPH_ID": "graphA",
"type": "tree",
"ref-target": "example-entities.csv",
"entities": [
"paintA",
"paintB"
],
"relations": [
{"source": "paintA", "target": "paintB"}
]
}
See also