creating nrt-conform files with python pandas
In order to submit measurement data to the monitoring services of O2A (e.g. via https://ingest.o2a-data.de/ the data needs to meet the nrt-format requirements.
The easiest and most straight forward approach in python is to use pandas to create nrt files. It might need to be installed via pip. random is native part of python.
python
import pandas as pd
from random import uniformCreating synthetic data
python
timeAndDate = [
"2024-12-12T10:10:41.123",
"2024-12-12T10:10:41.456",
"2024-12-12T10:10:41.789",
"2024-12-12T10:10:42.012",
"2024-12-12T10:10:42.345",
"2024-12-12T10:10:42.678",
]
value1 = [uniform(2.5, 5.3) for i in range(0, len(timeAndDate))]
value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]
valueHead1 = "vessel:heincke:super-tsg-7294:parameter1 [unit]"
valueHead2 = "vessel:heincke:super-tsg-7294:parameter2 [unit]"
data = pd.DataFrame({"datetime": timeAndDate, valueHead1: value1, valueHead2: value2})
data["datetime"] = pd.to_datetime(data["datetime"], format="%Y-%m-%dT%H:%M:%S.%f")
print(data.to_csv(sep="\t", index=False))
## data.to_csv('localFile.nrt',sep="\t", index=False)bash
datetime vessel:heincke:super-tsg-7294:parameter1 vessel:heincke:super-tsg-7294:parameter2
0 2024-12-12 10:10:41.123 4.795156 5.357536
1 2024-12-12 10:10:41.456 3.065835 5.432963
2 2024-12-12 10:10:41.789 2.735964 5.087254
3 2024-12-12 10:10:42.012 4.184826 5.019675
4 2024-12-12 10:10:42.345 3.561031 5.043519
5 2024-12-12 10:10:42.678 3.904502 5.417007The format is an ordinary dataframe. When converting the datetime column the format of the datetime string needs to be known.
Remark: According to ISO8601 microseconds need to be treated as microsecondes. Hence, a string, such as "2024-12-12T10:10:42.200" needs the trailing zeros.
Common issues
duplicated timestamps
Either due to improper data transfer from sensor head to data logger (via RS232) or faulty rounding of timestamps, occassionally datetime elements are duplicated/triplicated/quadrupled...
python
timeAndDate = ["12/12/24 11:11:25", "12/12/24 11:11:26", "12/12/24 11:11:27", "12/12/24 11:11:28", "12/12/24 11:11:28", "12/12/24 11:11:28", "12/12/24 11:11:29", "12/12/24 11:11:30", "12/12/24 11:11:31", "12/12/24 11:11:32", "12/12/24 11:11:33", "12/12/24 11:11:34"]
value1 = [uniform(2.5, 5.3) for i in range(0, len(timeAndDate))]
value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]
valueHead1 = "vessel:heincke:super-tsg-7294:parameter1"
valueHead2 = "vessel:heincke:super-tsg-7294:parameter2"
dataWithDuplicates = pd.DataFrame(
{"datetime": timeAndDate, valueHead1: value1, valueHead2: value2}
)
dataWithDuplicates["datetime"] = pd.to_datetime(
dataWithDuplicates["datetime"], format="%m/%d/%y %H:%M:%S"
)
print(dataWithDuplicates.to_csv(sep = '\t', index = False))bash
datetime vessel:heincke:super-tsg-7294:parameter1 vessel:heincke:super-tsg-7294:parameter2
2024-12-12 11:11:25 3.05420256630097 5.556311305583497
2024-12-12 11:11:26 4.331679809332293 5.285814060828665
2024-12-12 11:11:27 5.160425046733552 5.0136614459029305
2024-12-12 11:11:28 2.9538311363345517 5.290511082245285
2024-12-12 11:11:28 5.13437778686753 5.351214842185373
2024-12-12 11:11:28 3.643865358733638 5.181282593910026
2024-12-12 11:11:29 4.927755381988099 5.023216020497009
2024-12-12 11:11:30 4.098515724757325 5.6079132752962195
2024-12-12 11:11:31 4.659405017177502 5.078003757624568
2024-12-12 11:11:32 3.9500447754139243 5.444177040938766
2024-12-12 11:11:33 4.799860032246979 5.186303037288317
2024-12-12 11:11:34 4.736753296063766 5.177797932464702After conscientious inspection the dulicated entries can be removed via
python
dataWithDuplicates = dataWithDuplicates.drop_duplicates(subset="datetime")
print(dataWithDuplicates.to_csv(sep = '\t', index = False))bash
datetime vessel:heincke:super-tsg-7294:parameter1 vessel:heincke:super-tsg-7294:parameter2
2024-12-12 11:11:25 3.05420256630097 5.556311305583497
2024-12-12 11:11:26 4.331679809332293 5.285814060828665
2024-12-12 11:11:27 5.160425046733552 5.0136614459029305
2024-12-12 11:11:28 2.9538311363345517 5.290511082245285
2024-12-12 11:11:29 4.927755381988099 5.023216020497009
2024-12-12 11:11:30 4.098515724757325 5.6079132752962195
2024-12-12 11:11:31 4.659405017177502 5.078003757624568
2024-12-12 11:11:32 3.9500447754139243 5.444177040938766
2024-12-12 11:11:33 4.799860032246979 5.186303037288317
2024-12-12 11:11:34 4.736753296063766 5.177797932464702out of order dates
python
timeAndDate = [
1733997585,
1733997591,
1733997593,
1733997588,
1733997584,
1733997586,
1733997589,
1733997587,
1733997590,
1733997592,
]
value2 = [uniform(5, 5.7) for i in range(0, len(timeAndDate))]
valueHead2 = "vessel:heincke:super-tsg-7294:parameter2"
dataUnordered = pd.DataFrame({"datetime": timeAndDate, valueHead2: value2})
dataUnordered["datetime"] = pd.to_datetime(
dataUnordered["datetime"], unit="s" # format="%Y-%m-%dT%H:%M:%S.%f"
)
print(dataUnordered.to_csv(sep="\t", index=False))bash
datetime vessel:heincke:super-tsg-7294:parameter2
2024-12-12 09:59:45 5.048745981400773
2024-12-12 09:59:51 5.135944549704515
2024-12-12 09:59:53 5.689464864954616
2024-12-12 09:59:48 5.081178415589162
2024-12-12 09:59:44 5.599896591249113
2024-12-12 09:59:46 5.654179667112868
2024-12-12 09:59:49 5.6923318343564935
2024-12-12 09:59:47 5.039393883234645
2024-12-12 09:59:50 5.397054256182654
2024-12-12 09:59:52 5.434608056651356This can be fixed by simple sorting. In theory, the sorting function could be applied in general to the data frame.
python
dataUnordered = dataUnordered.sort_values("datetime")
print(dataUnordered.to_csv(sep="\t", index=False))bash
datetime vessel:heincke:super-tsg-7294:parameter2
2024-12-12 09:59:44 5.599896591249113
2024-12-12 09:59:45 5.048745981400773
2024-12-12 09:59:46 5.654179667112868
2024-12-12 09:59:47 5.039393883234645
2024-12-12 09:59:48 5.081178415589162
2024-12-12 09:59:49 5.6923318343564935
2024-12-12 09:59:50 5.397054256182654
2024-12-12 09:59:51 5.135944549704515
2024-12-12 09:59:52 5.434608056651356
2024-12-12 09:59:53 5.689464864954616Of course, sorting and doublet removal could also be used in combination.
Remark: The dates need to be sorted in descending order.