O2A GeoCSV (.sdi.csv)
| Author | Peter Konopatzky |
|---|---|
| Technical Contacts | Peter Konopatzky, Andreas Walter |
| Version | 1.1 |
Basics
This specification extends the commonly known GeoCSV specification with support for basic relational structure. It also adds requirements to enable data to be used as exchange format in automated O2A SDI dataflows (see SOPs).
Its main purpose are single-source WMS/WFS layers where data is coming in a harmonised table structure and vocabulary.
File Types
This specification comprises two main file types: Layer Definition Files and Data Files.
The Layer Definition Files define the target structure of the database table that is or will be fueling a WMS/WFS layer. Data Files contain the actual data of that table.
To facilitate maintenance and extension of your data two special data file types can be used: Join Files and Chunk Files. Both are optional concepts. Join Files are enable you to reduce data redundancy. Chunk Files help you organise your data.
Technicalities
For all of these file types – with _layer_def.csvt being the exception – the CSV and GeoCSV specifications (giswiki.ch) apply. However, a common set of additional or more specific requirements is outlined below. Make sure to read the according subsections, including the Examples section.
- delimiter: tab
INFO
Why tab separation? Compared to commas or semicolons, tabs rarely occur within strings. This minimises the demand for quoting values or strings.
Why not
.sdi.tsv? The CSVT extension only works for files named*.csv. - decimal separator: dot (
.) - encoding: UTF-8
- file name restrictions:
<basename>and<chunk>: alphanumeric, no special characters except underscore (_), dash (-), dot (.), hash symbol (#)
- column name restrictions:
- adherence to column name conventions for basic metadata (see Metadata Vocabulary)
- allowed characters: alphanumeric (
a-zA-Z0-9), underscore (_), dot (.)- recommendation: lower case alphanumeric and underscores only
- needs to be compliant with Postgres identifier naming constraints
Metadata Vocabulary
Within O2A and especially the O2A SDI ecosystem specific vocabulary is used for specific metadata. This is a list of compulsory and encouraged vocabulary to use for column headers if columns with corresponding meaning are used.
WARNING
Compulsory vocabulary does not mean that the column itself is compulsory. It means that if a column with identical meaning is present, it should be named like this.
INFO
Some vocabulary is encouraged to foster consistency among O2A-hosted OWS, but may be deviated from (e.g. basis instead of platform).
Space/Time
Acquisition
References
Layer Definition Files
To define the desired (layer) table structure, including column types, two specific files need to be provided: _layer_def.csv and _layer_def.csvt. They define the structure of the database table.
A bunch of examples for layer definition files can be found in the examples section! For even more information, see GDAL's CSV driver Which is being used to process O2A GeoCSV files.
Column Names
The column names of the table are defined by the _layer_def.csv file. It is an empty CSV file, consisting of exactly one line with exactly all column headers the target layer should contain, separated by tabs. Consider the section on metadata vocabulary within the O2A SDI when naming columns.
Three columns are mandatory-ish:
Column Types
The column types of the table are defined by the _layer_def.csvt file. It is a CSVT file, consisting of exactly one line with column types for exactly all columns specified in the _layer_def.csv file, separated by commas (,). Supported column types are documented in the following table.
Basic Data Files
Basic O2A GeoCSV data files are GeoCSV files, adhering to the O2A-specific technicalities. They look much like the _layer_def.csv file, but with actual data in it. There can be an arbitrary amount of data files contributing to the table defined by the layer definition files.
Not every data file needs to contain all columns (except for those mandatory) of the target layer. Columns defined in the layer definition but left out in data files will hold null values in the according rows.
See examples for inspiration, especially the minimal examples #1 and #2.
Columns
Use these column names/headers in your data files
| column header | column data type | column+value mandatory? | description | example values |
|---|---|---|---|---|
csv_join | none | Used to hold join keys for the use of join files. ⚠️ Note: Must not appear in layer definition files! |
Optional Data Files
Join Files
Basic relations between files are supported. For each data file (<basename>.sdi.csv) a second CSV (<basename>.sdi.join.csv) file can be joined using the column csv_join to reduce redundancy. Both files will be joined during database ingest using the following statement. Join files can hold arbitrary data and metadata columns (but not the geometry column). For details, see the according example.
Both data file and join file need to share the same <basename> but the join file has to have the file extension .sdi.join.csv
- data file:
<basename>.sdi.csv - join file:
<basename>.sdi.join.csv
Chunk Files
When multiple data files should share one join file, they can be chunked. The data files can have different table structures as long as they adhere to the specifications of basic data files. Join files cannot get chunked.
- data files:
<basename>@<chunk>.sdi.csv - join file:
<basename>.sdi.join.csv(see Join Files)
For details, see the pure chunking example and the complex example.
Examples
In this section, a bunch of examples will showcase how this specification works.
Minimal #1
This example consists of one set of layer definition files and two data files. Both data files contain data for every column.
tsv
date_time_start geometry temperature_degccsv
DateTime,WKT,Realtsv
date_time_start geometry temperature_degc
2023-06-28T00:00:00+00 POINT(8.5 53.5) 28.76
2023-06-28T01:00:00+00 POINT(8.5 53.5) 27.65tsv
date_time_start geometry temperature_degc
2023-06-28T02:00:00+00 POINT(8.5 53.5) 26.54
2023-06-28T03:00:00+00 POINT(8.5 53.5) 25.43Using the input files above results in the following table.
Minimal #2
This example consists of one set of layer definition files and two data files. The data files contain data in different columns. They also have different column order.
tsv
date_time_start geometry temperature_degc salinity_psu platformcsv
DateTime,WKT,Real,Real,Stringtsv
date_time_start geometry platform temperature_degc
2023-06-28T00:00:00+00 POINT(8.5 53.5) 28.76
2023-06-28T01:00:00+00 POINT(8.5 53.5) Black Pearl 27.65tsv
date_time_start geometry salinity_psu platform
2023-06-28T02:00:00+00 POINT(8.5 53.5) 26.54 Flying Dutchman
2023-06-28T03:00:00+00 POINT(8.5 53.5) 25.43Using the input files above results in the following table. It can be observed, that the final column order is determined by the layer definition files and varying column order in data does not matter. Also, it gets clear how missing values get treated.
Joining
This example consists of one set of layer definition files, one data file and one join file.
The data file contains data on people, the join file contains data on institutions. Both files share the column csv_join which will be used to append institution data to the people table.
tsv
date_time_start geometry name institute department division group phonecsv
DateTime,WKT,String,String,String,String,String,Stringtsv
date_time_start geometry name phone csv_join
2015-01-01T01:00:00+00 POINT(8.5 53.5) Andreas -1744 awi-se
2019-08-01T01:00:00+00 POINT(8.5 53.5) Robin awi-rz-se
2019-11-01T00:00:00+00 POINT(8.5 53.5) Kono -2362 awi-se
2020-01-01T00:00:00+00 POINT(8.5 53.5) Christopher awi-se
2020-01-01T00:00:00+00 POINT(8.5 53.5) Max -2561 awi-dls
2021-01-01T00:00:00+00 POINT(10.12 54.2) Felix geomar
2000-01-01T00:00:00+00 POINT(8.5 53.5) Antje awi-dirtsv
csv_join institute department division group
awi-se AWI Computing & Data Centre DATA Software Engineering
awi-dls AWI Computing & Data Centre DATA Data Logistics Support
geomar GEOMAR
awi-dir AWI Board of DirectorsUsing the input files above results in the following table.
Chunking
This example consists of one set of layer definition files and two data files. It is much like the minimal example #1](#minimal-1) but both files share the <basename> and their filename differs only in the @<chunk> part.
Data for this example is taken from wetterkontor.de.
tsv
date_time_start date_time_end geometry precipitation_mmcsv
DateTime,DateTime,WKT,Realtsv
date_time_start date_time_end geometry precipitation_mm
2022-01-01T00:00:00+00 2022-02-01T00:00:00+00 POINT(8.57 53.54) 54.9
2022-02-01T00:00:00+00 2022-03-01T00:00:00+00 POINT(8.57 53.54) 129.5
2022-03-01T00:00:00+00 2022-04-01T00:00:00+00 POINT(8.57 53.54) 24.2
2022-04-01T00:00:00+00 2022-05-01T00:00:00+00 POINT(8.57 53.54) 57.7
2022-05-01T00:00:00+00 2022-06-01T00:00:00+00 POINT(8.57 53.54) 76.1
2022-06-01T00:00:00+00 2022-07-01T00:00:00+00 POINT(8.57 53.54) 78.4
2022-07-01T00:00:00+00 2022-08-01T00:00:00+00 POINT(8.57 53.54) 60.1
2022-08-01T00:00:00+00 2022-09-01T00:00:00+00 POINT(8.57 53.54) 21.0
2022-09-01T00:00:00+00 2022-10-01T00:00:00+00 POINT(8.57 53.54) 170.9
2022-10-01T00:00:00+00 2022-11-01T00:00:00+00 POINT(8.57 53.54) 24.6
2022-11-01T00:00:00+00 2022-12-01T00:00:00+00 POINT(8.57 53.54) 47.8
2022-12-01T00:00:00+00 2023-01-01T00:00:00+00 POINT(8.57 53.54) 67.1tsv
date_time_start date_time_end geometry precipitation_mm
2023-01-01T00:00:00+00 2023-02-01T00:00:00+00 POINT(8.57 53.54) 93.0
2023-02-01T00:00:00+00 2023-03-01T00:00:00+00 POINT(8.57 53.54) 41.9
2023-03-01T00:00:00+00 2023-04-01T00:00:00+00 POINT(8.57 53.54) 96.1
2023-04-01T00:00:00+00 2023-05-01T00:00:00+00 POINT(8.57 53.54) 71.0
2023-05-01T00:00:00+00 2023-06-01T00:00:00+00 POINT(8.57 53.54) 14.8
2023-06-01T00:00:00+00 2023-07-01T00:00:00+00 POINT(8.57 53.54) 48.4Using the input files above results in the following table.
The same result could have been achieved without chunking but simply using different <basename> values. However, this serves to purely demonstrate the chunking principle. To actually see what this would be useful for, checkout the complex example which combines joining and chunking.
Complex
This example consists of one set of layer definition files, one unchunked data file with a join file and one set of two chunked data files with another join file.
tsv
date_time_start date_time_end geometry state city area_km² population unemployment_%csv
DateTime,DateTime,WKT,String,String,Real,Real,Realtsv
date_time_start date_time_end geometry population csv_join
2000-01-01T00:00:00+00 2001-01-01T00:00:00+00 POINT(8.54 53.54) 120822 bhv
2001-01-01T00:00:00+00 2002-01-01T00:00:00+00 POINT(8.54 53.54) 118701 bhv
2002-01-01T00:00:00+00 2003-01-01T00:00:00+00 POINT(8.54 53.54) 119111 bhv
2003-01-01T00:00:00+00 2004-01-01T00:00:00+00 POINT(8.54 53.54) 118276 bhv
2004-01-01T00:00:00+00 2005-01-01T00:00:00+00 POINT(8.54 53.54) 117281 bhvtsv
date_time_start date_time_end geometry unemployment_% csv_join
2005-01-01T00:00:00+00 2006-01-01T00:00:00+00 POINT(8.54 53.54) 23.7 bhv
2006-01-01T00:00:00+00 2007-01-01T00:00:00+00 POINT(8.54 53.54) 20.7 bhv
2007-01-01T00:00:00+00 2008-01-01T00:00:00+00 POINT(8.54 53.54) 18.5 bhv
2008-01-01T00:00:00+00 2009-01-01T00:00:00+00 POINT(8.54 53.54) 16.7 bhv
2009-01-01T00:00:00+00 2010-01-01T00:00:00+00 POINT(8.54 53.54) 15.4 bhvtsv
date_time_start date_time_end geometry population unemployment_% csv_join comment
2000-01-01T00:00:00+00 2001-01-01T00:00:00+00 POINT(7.10 51.51) 278695 ge unemployment data missing
2001-01-01T00:00:00+00 2002-01-01T00:00:00+00 POINT(7.10 51.51) 275835 15.3 ge
2002-01-01T00:00:00+00 2003-01-01T00:00:00+00 POINT(7.10 51.51) 274926 16.0 ge
2003-01-01T00:00:00+00 2004-01-01T00:00:00+00 POINT(7.10 51.51) 273782 17.0 ge
2004-01-01T00:00:00+00 2005-01-01T00:00:00+00 POINT(7.10 51.51) 270109 18.0 ge
2005-01-01T00:00:00+00 2006-01-01T00:00:00+00 POINT(7.10 51.51) 268102 23.4 ge
2006-01-01T00:00:00+00 2007-01-01T00:00:00+00 POINT(7.10 51.51) 266772 20.1 ge
2007-01-01T00:00:00+00 2008-01-01T00:00:00+00 POINT(7.10 51.51) 264765 16.7 ge
2008-01-01T00:00:00+00 2009-01-01T00:00:00+00 POINT(7.10 51.51) 262063 15.2 ge
2009-01-01T00:00:00+00 2010-01-01T00:00:00+00 POINT(7.10 51.51) 259744 15.1 ge
2000-01-01T00:00:00+00 2001-01-01T00:00:00+00 POINT(7.63 51.96) 265609 ms unemployment data missing
2001-01-01T00:00:00+00 2002-01-01T00:00:00+00 POINT(7.63 51.96) 267197 6.7 ms
2002-01-01T00:00:00+00 2003-01-01T00:00:00+00 POINT(7.63 51.96) 268945 7.3 ms
2003-01-01T00:00:00+00 2004-01-01T00:00:00+00 POINT(7.63 51.96) 269579 7.8 ms
2004-01-01T00:00:00+00 2005-01-01T00:00:00+00 POINT(7.63 51.96) 270038 8.3 ms
2005-01-01T00:00:00+00 2006-01-01T00:00:00+00 POINT(7.63 51.96) 270868 9.1 ms
2006-01-01T00:00:00+00 2007-01-01T00:00:00+00 POINT(7.63 51.96) 272106 8.4 ms
2007-01-01T00:00:00+00 2008-01-01T00:00:00+00 POINT(7.63 51.96) 272951 7.1 ms
2008-01-01T00:00:00+00 2009-01-01T00:00:00+00 POINT(7.63 51.96) 273875 6.4 ms
2009-01-01T00:00:00+00 2010-01-01T00:00:00+00 POINT(7.63 51.96) 275543 6.4 mstsv
csv_join city state area_km²
bhv Bremerhaven Bremen 93.8tsv
csv_join city state area_km²
ge Gelsenkirchen NRW 302.9
ms Münster NRW 104.8Using the input files above results in the following table.
Lists and Dictionaries
tsv
date_time_start geometry people_dict people_listcsv
DateTime,WKT,String,Stringtsv
date_time_start geometry people_dict people_list
2023-06-28T00:00:00+00 POINT(8.5 53.5) {"Alice": {"age": 18, "favorite_colour": "blue"}, "Bob": {"age": 19, "favorite_colour": "pink"}}
2023-06-28T01:00:00+00 POINT(8.5 53.5) ["Alice", "Bob", "Charlie"]Using the input files above results in the following table.
Unit Representation
There is no one way to represent units with O2A GeoCSV files. All of the following are valid. However, be constistent!
- column naming: e.g.
temperature_degc - additional unit column: e.g.
temperatureandtemperature_unit - string values
- e.g.
"28.76 °C"instead of"28.76" - will prohibit range filtering
- e.g.
- add unit information to layer abstract (see Data Product Configuration specification)