creating nrt-conform files with python pandas

In order to submit measurement data to the monitoring services of O2A (e.g. via https://ingest.o2a-data.de/ the data needs to meet the nrt-format requirements.

The easiest and most straight forward approach in python is to use pandas to create nrt files. It might need to be installed via pip. random is native part of python.

python

import pandas as pd
from random import uniform

Creating synthetic data

python

timeAndDate = [
    "2024-12-12T10:10:41.123",
    "2024-12-12T10:10:41.456",
    "2024-12-12T10:10:41.789",
    "2024-12-12T10:10:42.012",
    "2024-12-12T10:10:42.345",
    "2024-12-12T10:10:42.678",
]

value1 = [uniform(2.5, 5.3) for i in range(0, len(timeAndDate))]
value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]

valueHead1 = "vessel:heincke:super-tsg-7294:parameter1 [unit]"
valueHead2 = "vessel:heincke:super-tsg-7294:parameter2 [unit]"

data = pd.DataFrame({"datetime": timeAndDate, valueHead1: value1, valueHead2: value2})

data["datetime"] = pd.to_datetime(data["datetime"], format="%Y-%m-%dT%H:%M:%S.%f")

print(data.to_csv(sep="\t", index=False))
## data.to_csv('localFile.nrt',sep="\t", index=False)

bash

datetime  vessel:heincke:super-tsg-7294:parameter1  vessel:heincke:super-tsg-7294:parameter2
0 2024-12-12 10:10:41.123                                  4.795156                                  5.357536
1 2024-12-12 10:10:41.456                                  3.065835                                  5.432963
2 2024-12-12 10:10:41.789                                  2.735964                                  5.087254
3 2024-12-12 10:10:42.012                                  4.184826                                  5.019675
4 2024-12-12 10:10:42.345                                  3.561031                                  5.043519
5 2024-12-12 10:10:42.678                                  3.904502                                  5.417007

The format is an ordinary dataframe. When converting the datetime column the format of the datetime string needs to be known.

Remark: According to ISO8601 microseconds need to be treated as microsecondes. Hence, a string, such as "2024-12-12T10:10:42.200" needs the trailing zeros.

Common issues

duplicated timestamps

Either due to improper data transfer from sensor head to data logger (via RS232) or faulty rounding of timestamps, occassionally datetime elements are duplicated/triplicated/quadrupled...

python

timeAndDate = ["12/12/24 11:11:25", "12/12/24 11:11:26", "12/12/24 11:11:27", "12/12/24 11:11:28", "12/12/24 11:11:28", "12/12/24 11:11:28", "12/12/24 11:11:29", "12/12/24 11:11:30", "12/12/24 11:11:31", "12/12/24 11:11:32", "12/12/24 11:11:33", "12/12/24 11:11:34"]


value1 = [uniform(2.5, 5.3) for i in range(0, len(timeAndDate))]
value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]

valueHead1 = "vessel:heincke:super-tsg-7294:parameter1"
valueHead2 = "vessel:heincke:super-tsg-7294:parameter2"

dataWithDuplicates = pd.DataFrame(
    {"datetime": timeAndDate, valueHead1: value1, valueHead2: value2}
)

dataWithDuplicates["datetime"] = pd.to_datetime(
    dataWithDuplicates["datetime"], format="%m/%d/%y %H:%M:%S"
)

print(dataWithDuplicates.to_csv(sep = '\t', index = False))

bash

datetime	vessel:heincke:super-tsg-7294:parameter1	vessel:heincke:super-tsg-7294:parameter2
2024-12-12 11:11:25	3.05420256630097	5.556311305583497
2024-12-12 11:11:26	4.331679809332293	5.285814060828665
2024-12-12 11:11:27	5.160425046733552	5.0136614459029305
2024-12-12 11:11:28	2.9538311363345517	5.290511082245285
2024-12-12 11:11:28	5.13437778686753	5.351214842185373
2024-12-12 11:11:28	3.643865358733638	5.181282593910026
2024-12-12 11:11:29	4.927755381988099	5.023216020497009
2024-12-12 11:11:30	4.098515724757325	5.6079132752962195
2024-12-12 11:11:31	4.659405017177502	5.078003757624568
2024-12-12 11:11:32	3.9500447754139243	5.444177040938766
2024-12-12 11:11:33	4.799860032246979	5.186303037288317
2024-12-12 11:11:34	4.736753296063766	5.177797932464702

After conscientious inspection the dulicated entries can be removed via

python

dataWithDuplicates = dataWithDuplicates.drop_duplicates(subset="datetime")

print(dataWithDuplicates.to_csv(sep = '\t', index = False))

bash

datetime	vessel:heincke:super-tsg-7294:parameter1	vessel:heincke:super-tsg-7294:parameter2
2024-12-12 11:11:25	3.05420256630097	5.556311305583497
2024-12-12 11:11:26	4.331679809332293	5.285814060828665
2024-12-12 11:11:27	5.160425046733552	5.0136614459029305
2024-12-12 11:11:28	2.9538311363345517	5.290511082245285
2024-12-12 11:11:29	4.927755381988099	5.023216020497009
2024-12-12 11:11:30	4.098515724757325	5.6079132752962195
2024-12-12 11:11:31	4.659405017177502	5.078003757624568
2024-12-12 11:11:32	3.9500447754139243	5.444177040938766
2024-12-12 11:11:33	4.799860032246979	5.186303037288317
2024-12-12 11:11:34	4.736753296063766	5.177797932464702

out of order dates

python

timeAndDate = [
    1733997585,
    1733997591,
    1733997593,
    1733997588,
    1733997584,
    1733997586,
    1733997589,
    1733997587,
    1733997590,
    1733997592,
]

value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]

valueHead2 = "vessel:heincke:super-tsg-7294:parameter2"

dataUnordered = pd.DataFrame({"datetime": timeAndDate, valueHead2: value2})

dataUnordered["datetime"] = pd.to_datetime(
    dataUnordered["datetime"], unit="s"  # format="%Y-%m-%dT%H:%M:%S.%f"
)

print(dataUnordered.to_csv(sep="\t", index=False))

bash

datetime	vessel:heincke:super-tsg-7294:parameter2
2024-12-12 09:59:45	5.048745981400773
2024-12-12 09:59:51	5.135944549704515
2024-12-12 09:59:53	5.689464864954616
2024-12-12 09:59:48	5.081178415589162
2024-12-12 09:59:44	5.599896591249113
2024-12-12 09:59:46	5.654179667112868
2024-12-12 09:59:49	5.6923318343564935
2024-12-12 09:59:47	5.039393883234645
2024-12-12 09:59:50	5.397054256182654
2024-12-12 09:59:52	5.434608056651356

This can be fixed by simple sorting. In theory, the sorting function could be applied in general to the data frame.

python

dataUnordered = dataUnordered.sort_values("datetime")
print(dataUnordered.to_csv(sep="\t", index=False))

bash

datetime	vessel:heincke:super-tsg-7294:parameter2
2024-12-12 09:59:44	5.599896591249113
2024-12-12 09:59:45	5.048745981400773
2024-12-12 09:59:46	5.654179667112868
2024-12-12 09:59:47	5.039393883234645
2024-12-12 09:59:48	5.081178415589162
2024-12-12 09:59:49	5.6923318343564935
2024-12-12 09:59:50	5.397054256182654
2024-12-12 09:59:51	5.135944549704515
2024-12-12 09:59:52	5.434608056651356
2024-12-12 09:59:53	5.689464864954616

Of course, sorting and doublet removal could also be used in combination.

Remark: The dates need to be sorted in descending order.

creating nrt-conform files with python pandas ​

Creating synthetic data ​

Common issues ​

duplicated timestamps ​

out of order dates ​

creating nrt-conform files with python pandas

Creating synthetic data

Common issues

duplicated timestamps

out of order dates