Reading Telemachus data¶
A Telemachus Telemachus dataset is, on disk:
my-dataset/
├── manifest.yaml ← SPEC-02: device, trip, sensors, acc_periods…
├── <device_1>.parquet ← signal, columnar
├── <device_2>.parquet
└── …
The signal parquet is "pure" — only the columns defined by SPEC-01
§3 (ts, lat, lon, speed_mps, ax/ay/az_mps2, optional gx/gy/gz_rad_s,
recommended heading_deg, hdop, n_satellites).
Everything else lives in the manifest.
With pandas¶
import pandas as pd
import yaml
from pathlib import Path
ds = Path("my-dataset")
manifest = yaml.safe_load((ds / "manifest.yaml").read_text())
df = pd.concat(
[pd.read_parquet(p) for p in ds.glob("*.parquet")],
ignore_index=True,
).sort_values("ts").reset_index(drop=True)
# Inherit device_id from manifest if absent (SPEC-02 §4.1)
if "device_id" not in df.columns:
devices = manifest.get("hardware", {}).get("devices", [])
if len(devices) == 1:
df["device_id"] = devices[0]["name"]
# Tag each row with its accelerometer frame (SPEC-02 §4.2)
def frame_for(ts):
for p in manifest.get("acc_periods", []):
if pd.Timestamp(p["start"]) <= ts <= pd.Timestamp(p["end"]):
return p["frame"]
return "raw" # default
df["acc_frame"] = pd.to_datetime(df["ts"]).apply(frame_for)
With DuckDB¶
DuckDB reads parquet natively and is great for ad-hoc exploration:
import duckdb
con = duckdb.connect()
con.sql("SELECT * FROM 'my-dataset/*.parquet' LIMIT 5").show()
con.sql("""
SELECT
date_trunc('minute', ts) AS minute,
AVG(speed_mps) AS v,
COUNT(*) AS n
FROM 'my-dataset/*.parquet'
GROUP BY 1 ORDER BY 1
""").show()
Multi-rate gotcha¶
Telemachus files are timestamped at the IMU rate (typically 10 Hz). GNSS
columns (lat, lon, speed_mps, heading_deg) contain NaN on
rows where no GNSS fix is available.
When computing per-row metrics, drop NaNs explicitly:
Per-trip iteration¶
If your manifest declares trip_carrier_states, you can iterate
trips and filter on is_vehicle_data: