O2A GeoCSV (`.sdi.csv`)

Author	Peter Konopatzky
Technical Contacts	Peter Konopatzky, Andreas Walter
Version	1.1

Basics

This specification extends the commonly known GeoCSV specification with support for basic relational structure. It also adds requirements to enable data to be used as exchange format in automated O2A SDI dataflows (see SOPs).

Its main purpose are single-source WMS/WFS layers where data is coming in a harmonised table structure and vocabulary.

File Types

This specification comprises two main file types: Layer Definition Files and Data Files.

The Layer Definition Files define the target structure of the database table that is or will be fueling a WMS/WFS layer. Data Files contain the actual data of that table.

To facilitate maintenance and extension of your data two special data file types can be used: Join Files and Chunk Files. Both are optional concepts. Join Files are enable you to reduce data redundancy. Chunk Files help you organise your data.

Technicalities

For all of these file types – with _layer_def.csvt being the exception – the CSV and GeoCSV specifications (giswiki.ch) apply. However, a common set of additional or more specific requirements is outlined below. Make sure to read the according subsections, including the Examples section.

delimiter: tab
INFO
Why tab separation? Compared to commas or semicolons, tabs rarely occur within strings. This minimises the demand for quoting values or strings.
Why not .sdi.tsv? The CSVT extension only works for files named *.csv.
decimal separator: dot (.)
encoding: UTF-8
file name restrictions:
- <basename> and <chunk>: alphanumeric, no special characters except underscore (_), dash (-), dot (.), hash symbol (#)
column name restrictions:
- adherence to column name conventions for basic metadata (see Metadata Vocabulary)
- allowed characters: alphanumeric (a-zA-Z0-9), underscore (_), dot (.)
  - recommendation: lower case alphanumeric and underscores only
  - needs to be compliant with Postgres identifier naming constraints

Metadata Vocabulary

Within O2A and especially the O2A SDI ecosystem specific vocabulary is used for specific metadata. This is a list of compulsory and encouraged vocabulary to use for column headers if columns with corresponding meaning are used.

WARNING

Compulsory vocabulary does not mean that the column itself is compulsory. It means that if a column with identical meaning is present, it should be named like this.

INFO

Some vocabulary is encouraged to foster consistency among O2A-hosted OWS, but may be deviated from (e.g. basis instead of platform).

Space/Time

Acquisition

References

Layer Definition Files

To define the desired (layer) table structure, including column types, two specific files need to be provided: _layer_def.csv and _layer_def.csvt. They define the structure of the database table.

A bunch of examples for layer definition files can be found in the examples section! For even more information, see GDAL's CSV driver Which is being used to process O2A GeoCSV files.

Column Names

The column names of the table are defined by the _layer_def.csv file. It is an empty CSV file, consisting of exactly one line with exactly all column headers the target layer should contain, separated by tabs. Consider the section on metadata vocabulary within the O2A SDI when naming columns.

Three columns are mandatory-ish:

Column Types

The column types of the table are defined by the _layer_def.csvt file. It is a CSVT file, consisting of exactly one line with column types for exactly all columns specified in the _layer_def.csv file, separated by commas (,). Supported column types are documented in the following table.

Basic Data Files

Basic O2A GeoCSV data files are GeoCSV files, adhering to the O2A-specific technicalities. They look much like the _layer_def.csv file, but with actual data in it. There can be an arbitrary amount of data files contributing to the table defined by the layer definition files.

Not every data file needs to contain all columns (except for those mandatory) of the target layer. Columns defined in the layer definition but left out in data files will hold null values in the according rows.

See examples for inspiration, especially the minimal examples #1 and #2.

Columns

Use these column names/headers in your data files

column header	column data type	column+value mandatory?	description	example values
`csv_join`	none		Used to hold join keys for the use of join files. ⚠️ Note: Must not appear in layer definition files!

Optional Data Files

Join Files

Basic relations between files are supported. For each data file (<basename>.sdi.csv) a second CSV (<basename>.sdi.join.csv) file can be joined using the column csv_join to reduce redundancy. Both files will be joined during database ingest using the following statement. Join files can hold arbitrary data and metadata columns (but not the geometry column). For details, see the according example.

Both data file and join file need to share the same <basename> but the join file has to have the file extension .sdi.join.csv

data file: <basename>.sdi.csv
join file: <basename>.sdi.join.csv

Chunk Files

When multiple data files should share one join file, they can be chunked. The data files can have different table structures as long as they adhere to the specifications of basic data files. Join files cannot get chunked.

data files: <basename>@<chunk>.sdi.csv
join file: <basename>.sdi.join.csv (see Join Files)

For details, see the pure chunking example and the complex example.

Examples

In this section, a bunch of examples will showcase how this specification works.

Minimal #1

This example consists of one set of layer definition files and two data files. Both data files contain data for every column.

_layer_def.csv_layer_def.csvt

tsv

date_time_start	geometry	temperature_degc

csv

DateTime,WKT,Real

data-0.sdi.csvdata-1.sdi.csv

tsv

date_time_start	geometry	temperature_degc
2023-06-28T00:00:00+00	POINT(8.5 53.5)	28.76
2023-06-28T01:00:00+00	POINT(8.5 53.5)	27.65

tsv

date_time_start	geometry	temperature_degc
2023-06-28T02:00:00+00	POINT(8.5 53.5)	26.54
2023-06-28T03:00:00+00	POINT(8.5 53.5)	25.43

Using the input files above results in the following table.

Minimal #2

This example consists of one set of layer definition files and two data files. The data files contain data in different columns. They also have different column order.

_layer_def.csv_layer_def.csvt

tsv

date_time_start	geometry	temperature_degc	salinity_psu	platform

csv

DateTime,WKT,Real,Real,String

data-0.sdi.csvdata-1.sdi.csv

tsv

date_time_start	geometry	platform	temperature_degc
2023-06-28T00:00:00+00	POINT(8.5 53.5)		28.76
2023-06-28T01:00:00+00	POINT(8.5 53.5)	Black Pearl	27.65

tsv

date_time_start	geometry	salinity_psu	platform
2023-06-28T02:00:00+00	POINT(8.5 53.5)	26.54	Flying Dutchman
2023-06-28T03:00:00+00	POINT(8.5 53.5)	25.43

Using the input files above results in the following table. It can be observed, that the final column order is determined by the layer definition files and varying column order in data does not matter. Also, it gets clear how missing values get treated.

Joining

This example consists of one set of layer definition files, one data file and one join file.

The data file contains data on people, the join file contains data on institutions. Both files share the column csv_join which will be used to append institution data to the people table.

layer_def.csv_layer_def.csvt

tsv

date_time_start	geometry	name	institute	department	division	group	phone

csv

DateTime,WKT,String,String,String,String,String,String

people.sdi.csvpeople.sdi.join.csv

tsv

date_time_start	geometry	name	phone	csv_join
2015-01-01T01:00:00+00	POINT(8.5 53.5)	Andreas	-1744	awi-se
2019-08-01T01:00:00+00	POINT(8.5 53.5)	Robin		awi-rz-se
2019-11-01T00:00:00+00	POINT(8.5 53.5)	Kono	-2362	awi-se
2020-01-01T00:00:00+00	POINT(8.5 53.5)	Christopher		awi-se
2020-01-01T00:00:00+00	POINT(8.5 53.5)	Max	-2561	awi-dls
2021-01-01T00:00:00+00	POINT(10.12 54.2)	Felix		geomar
2000-01-01T00:00:00+00	POINT(8.5 53.5)	Antje		awi-dir

tsv

csv_join	institute	department	division	group
awi-se	AWI	Computing & Data Centre	DATA	Software Engineering
awi-dls	AWI	Computing & Data Centre	DATA	Data Logistics Support
geomar	GEOMAR
awi-dir	AWI	Board of Directors

Using the input files above results in the following table.

Chunking

This example consists of one set of layer definition files and two data files. It is much like the minimal example #1](#minimal-1) but both files share the <basename> and their filename differs only in the @<chunk> part.

Data for this example is taken from wetterkontor.de.

_layer_def.csv_layer_def.csvt

tsv

date_time_start	date_time_end	geometry	precipitation_mm

csv

DateTime,DateTime,WKT,Real

bremerhaven@2022.sdi.csvbremerhaven@2023.sdi.csv

tsv

date_time_start	date_time_end	geometry	precipitation_mm
2022-01-01T00:00:00+00	2022-02-01T00:00:00+00	POINT(8.57 53.54)	54.9
2022-02-01T00:00:00+00	2022-03-01T00:00:00+00	POINT(8.57 53.54)	129.5
2022-03-01T00:00:00+00	2022-04-01T00:00:00+00	POINT(8.57 53.54)	24.2
2022-04-01T00:00:00+00	2022-05-01T00:00:00+00	POINT(8.57 53.54)	57.7
2022-05-01T00:00:00+00	2022-06-01T00:00:00+00	POINT(8.57 53.54)	76.1
2022-06-01T00:00:00+00	2022-07-01T00:00:00+00	POINT(8.57 53.54)	78.4
2022-07-01T00:00:00+00	2022-08-01T00:00:00+00	POINT(8.57 53.54)	60.1
2022-08-01T00:00:00+00	2022-09-01T00:00:00+00	POINT(8.57 53.54)	21.0
2022-09-01T00:00:00+00	2022-10-01T00:00:00+00	POINT(8.57 53.54)	170.9
2022-10-01T00:00:00+00	2022-11-01T00:00:00+00	POINT(8.57 53.54)	24.6
2022-11-01T00:00:00+00	2022-12-01T00:00:00+00	POINT(8.57 53.54)	47.8
2022-12-01T00:00:00+00	2023-01-01T00:00:00+00	POINT(8.57 53.54)	67.1

tsv

date_time_start	date_time_end	geometry	precipitation_mm
2023-01-01T00:00:00+00	2023-02-01T00:00:00+00	POINT(8.57 53.54)	93.0
2023-02-01T00:00:00+00	2023-03-01T00:00:00+00	POINT(8.57 53.54)	41.9
2023-03-01T00:00:00+00	2023-04-01T00:00:00+00	POINT(8.57 53.54)	96.1
2023-04-01T00:00:00+00	2023-05-01T00:00:00+00	POINT(8.57 53.54)	71.0
2023-05-01T00:00:00+00	2023-06-01T00:00:00+00	POINT(8.57 53.54)	14.8
2023-06-01T00:00:00+00	2023-07-01T00:00:00+00	POINT(8.57 53.54)	48.4

Using the input files above results in the following table.

The same result could have been achieved without chunking but simply using different <basename> values. However, this serves to purely demonstrate the chunking principle. To actually see what this would be useful for, checkout the complex example which combines joining and chunking.

Complex

This example consists of one set of layer definition files, one unchunked data file with a join file and one set of two chunked data files with another join file.

_layer_def.csv_layer_def.csvt

tsv

date_time_start	date_time_end	geometry	state	city	area_km²	population	unemployment_%

csv

DateTime,DateTime,WKT,String,String,Real,Real,Real

bremerhaven@2000-2004.sdi.csvbremerhaven@2005-2009.sdi.csvnrw.sdi.csv

tsv

date_time_start	date_time_end	geometry	population	csv_join
2000-01-01T00:00:00+00	2001-01-01T00:00:00+00	POINT(8.54 53.54)	120822	bhv
2001-01-01T00:00:00+00	2002-01-01T00:00:00+00	POINT(8.54 53.54)	118701	bhv
2002-01-01T00:00:00+00	2003-01-01T00:00:00+00	POINT(8.54 53.54)	119111	bhv
2003-01-01T00:00:00+00	2004-01-01T00:00:00+00	POINT(8.54 53.54)	118276	bhv
2004-01-01T00:00:00+00	2005-01-01T00:00:00+00	POINT(8.54 53.54)	117281	bhv

tsv

date_time_start	date_time_end	geometry	unemployment_%	csv_join
2005-01-01T00:00:00+00	2006-01-01T00:00:00+00	POINT(8.54 53.54)	23.7	bhv
2006-01-01T00:00:00+00	2007-01-01T00:00:00+00	POINT(8.54 53.54)	20.7	bhv
2007-01-01T00:00:00+00	2008-01-01T00:00:00+00	POINT(8.54 53.54)	18.5	bhv
2008-01-01T00:00:00+00	2009-01-01T00:00:00+00	POINT(8.54 53.54)	16.7	bhv
2009-01-01T00:00:00+00	2010-01-01T00:00:00+00	POINT(8.54 53.54)	15.4	bhv

tsv

date_time_start	date_time_end	geometry	population	unemployment_%	csv_join	comment
2000-01-01T00:00:00+00	2001-01-01T00:00:00+00	POINT(7.10 51.51)	278695		ge	unemployment data missing
2001-01-01T00:00:00+00	2002-01-01T00:00:00+00	POINT(7.10 51.51)	275835	15.3	ge
2002-01-01T00:00:00+00	2003-01-01T00:00:00+00	POINT(7.10 51.51)	274926	16.0	ge
2003-01-01T00:00:00+00	2004-01-01T00:00:00+00	POINT(7.10 51.51)	273782	17.0	ge
2004-01-01T00:00:00+00	2005-01-01T00:00:00+00	POINT(7.10 51.51)	270109	18.0	ge
2005-01-01T00:00:00+00	2006-01-01T00:00:00+00	POINT(7.10 51.51)	268102	23.4	ge
2006-01-01T00:00:00+00	2007-01-01T00:00:00+00	POINT(7.10 51.51)	266772	20.1	ge
2007-01-01T00:00:00+00	2008-01-01T00:00:00+00	POINT(7.10 51.51)	264765	16.7	ge
2008-01-01T00:00:00+00	2009-01-01T00:00:00+00	POINT(7.10 51.51)	262063	15.2	ge
2009-01-01T00:00:00+00	2010-01-01T00:00:00+00	POINT(7.10 51.51)	259744	15.1	ge
2000-01-01T00:00:00+00	2001-01-01T00:00:00+00	POINT(7.63 51.96)	265609		ms	unemployment data missing
2001-01-01T00:00:00+00	2002-01-01T00:00:00+00	POINT(7.63 51.96)	267197	6.7	ms
2002-01-01T00:00:00+00	2003-01-01T00:00:00+00	POINT(7.63 51.96)	268945	7.3	ms
2003-01-01T00:00:00+00	2004-01-01T00:00:00+00	POINT(7.63 51.96)	269579	7.8	ms
2004-01-01T00:00:00+00	2005-01-01T00:00:00+00	POINT(7.63 51.96)	270038	8.3	ms
2005-01-01T00:00:00+00	2006-01-01T00:00:00+00	POINT(7.63 51.96)	270868	9.1	ms
2006-01-01T00:00:00+00	2007-01-01T00:00:00+00	POINT(7.63 51.96)	272106	8.4	ms
2007-01-01T00:00:00+00	2008-01-01T00:00:00+00	POINT(7.63 51.96)	272951	7.1	ms
2008-01-01T00:00:00+00	2009-01-01T00:00:00+00	POINT(7.63 51.96)	273875	6.4	ms
2009-01-01T00:00:00+00	2010-01-01T00:00:00+00	POINT(7.63 51.96)	275543	6.4	ms

bremerhaven.sdi.join.csvnrw.sdi.join.csv

tsv

csv_join	city	state	area_km²
bhv	Bremerhaven	Bremen	93.8

tsv

csv_join	city	state	area_km²
ge	Gelsenkirchen	NRW	302.9
ms	Münster	NRW	104.8

Using the input files above results in the following table.

Lists and Dictionaries

_layer_def.csv_layer_def.csvt

tsv

date_time_start	geometry	people_dict	people_list

csv

DateTime,WKT,String,String

data-0.sdi.csv

tsv

date_time_start	geometry	people_dict	people_list
2023-06-28T00:00:00+00	POINT(8.5 53.5)	{"Alice": {"age": 18, "favorite_colour": "blue"}, "Bob": {"age": 19, "favorite_colour": "pink"}}
2023-06-28T01:00:00+00	POINT(8.5 53.5)		["Alice", "Bob", "Charlie"]

Using the input files above results in the following table.

Unit Representation

There is no one way to represent units with O2A GeoCSV files. All of the following are valid. However, be constistent!

column naming: e.g. temperature_degc
additional unit column: e.g. temperature and temperature_unit
string values
- e.g. "28.76 °C" instead of "28.76"
- will prohibit range filtering
add unit information to layer abstract (see Data Product Configuration specification)

Example Use Cases

Guides

Specs

O2A GeoCSV (`.sdi.csv`)

Basics

File Types

Technicalities

Metadata Vocabulary

Space/Time

Acquisition

References

Layer Definition Files

Column Names

Column Types

Basic Data Files

Columns

Optional Data Files

Join Files

Chunk Files

Examples

Minimal #1

Minimal #2

Joining

Chunking

Complex

Lists and Dictionaries

Unit Representation

O2A GeoCSV (.sdi.csv) ​

Basics ​

File Types ​

Technicalities ​

Metadata Vocabulary ​

Space/Time ​

Acquisition ​

References ​

Layer Definition Files ​

Column Names ​

Column Types ​

Basic Data Files ​

Columns ​

Optional Data Files ​

Join Files ​

Chunk Files ​

Examples ​

Minimal #1 ​

Minimal #2 ​

Joining ​

Chunking ​

Complex ​

Lists and Dictionaries ​

Unit Representation ​

O2A GeoCSV (`.sdi.csv`)

Basics

File Types

Technicalities

Metadata Vocabulary

Space/Time

Acquisition

References

Layer Definition Files

Column Names

Column Types

Basic Data Files

Columns

Optional Data Files

Join Files

Chunk Files

Examples

Minimal #1

Minimal #2

Joining

Chunking

Complex

Lists and Dictionaries

Unit Representation