Concepts¶
The five ideas you need to grok to read Telemachus data correctly.
Telemachus — functional groups¶
Telemachus is a flat parquet (the schema is columnar, not nested), but mentally it splits into five functional groups. Knowing these groups makes it much easier to remember why each column is there.
Telemachus = datetime ts
+ GPS lat, lon, speed_mps, heading_deg,
altitude_gps_m, hdop, n_satellites
+ IMU
├── accel ax_mps2, ay_mps2, az_mps2
├── gyro gx_rad_s, gy_rad_s, gz_rad_s (optional)
└── magneto mx_uT, my_uT, mz_uT (optional)
+ OBD / CAN ignition, odometer_m, rpm,
speed_obd_mps, fuel_*, … (optional)
+ extra x_<source>_<field> (vendor-specific)
| Group | What it tells you | Rate (typical) |
|---|---|---|
| datetime | When the sample was captured | IMU rate (10 Hz) |
| GPS | Where the vehicle is and how fast | 1 Hz (NaN between fixes) |
| IMU | How the vehicle moves (acc/rot/field) | 10–100 Hz |
| OBD/CAN | What the vehicle reports (bus data) | 1 Hz (varies) |
| extra | Anything vendor-specific that doesn't fit the above | varies |
Why flat columns, not nested structs?
Parquet handles flat columns best (projection pushdown, fast
scans). Nesting imu.accel.x_mps2 looks tidy but costs perf and
tooling compatibility. The mental model is nested; the schema
is flat.
Vendor-specific extra fields¶
When a vendor exposes a field that has no standard Telemachus
equivalent (a proprietary counter, a device-internal flag, …), use
the x_<source>_<field> naming convention:
| Column | Meaning |
|---|---|
x_teltonika_ext_voltage_v |
Teltonika external power voltage reading |
x_geotab_geofence_id |
Geotab-specific geofence identifier |
x_danlaw_codec_id |
Danlaw firmware codec tag |
The x_ prefix signals "not part of the normative Telemachus contract,
consumer may safely ignore it". The <source> segment keeps names
unambiguous across datasets that merge multiple vendors.
Telemachus record format — the layered model¶
| Layer | Name | Input | Output |
|---|---|---|---|
| Telemachus | Device | Hardware | Raw parquet — only what the device measures |
| enriched | Cleaned & Contextualised | Telemachus | Enriched Telemachus + map matching, DEM, IMU calibration, signal quality |
| events layer | Events & Situations | enriched | enriched + event column + event table (harsh brake, pothole, curve, …) |
The Telemachus spec is normative on Telemachus (SPEC-01). enriched and events layer column contracts are documented in SPEC-01 §4 but their algorithms are intentionally out of scope — different consumers can compute enriched/events layer differently as long as they emit conformant columns.
Rule of thumb: a column derived from external data (maps, DEM, algorithm output) belongs to enriched format or higher, never Telemachus.
Multi-rate IMU vs GNSS¶
Most devices stream IMU at 10 Hz and GNSS at 1 Hz. Telemachus is timestamped
at the IMU rate, with GNSS columns containing NaN between fixes:
ts lat lon speed_mps ax_mps2 ay_mps2 az_mps2
2025-01-01T08:00:00.0 49.3347 1.3830 5.2 0.12 0.03 9.81
2025-01-01T08:00:00.1 NaN NaN NaN 0.15 -0.01 9.80
2025-01-01T08:00:00.2 NaN NaN NaN 0.11 0.02 9.82
…
2025-01-01T08:00:01.0 49.3348 1.3831 5.3 0.13 0.01 9.81
When computing GNSS-only metrics (distance, average speed), drop NaNs explicitly. When computing IMU-only metrics (jerk, vibration), use all rows.
The manifest sensors.{gps,accelerometer}.rate_hz declares the
expected rates so consumers can pre-allocate buffers and pick
interpolation strategies.
AccPeriod — the accelerometer frame¶
The same physical accelerometer can output data in different reference frames depending on firmware state:
| Frame | At rest | Behaviour |
|---|---|---|
raw |
\|a\| ≈ 9.81 m/s² |
Unprocessed sensor output |
compensated |
\|a\| ≈ 0 m/s² |
Firmware has subtracted gravity |
partial |
0 < \|a\| < g |
Imperfect compensation |
This matters because downstream stages (IMU calibration, event detection) need to know whether gravity is in the signal.
The manifest declares one or more acc_periods segments — each a
contiguous time range with a coherent frame:
acc_periods:
- start: 2025-01-01T00:00:00Z
end: 2025-03-15T12:00:00Z
frame: compensated
detection_method: empirical
- start: 2025-03-15T12:00:01Z
end: present
frame: raw
detection_method: profile_change
Default if absent: a single implicit period with frame: "raw". See
SPEC-01 §3.6 for the full normative definition.
CarrierState — is this trip real driving?¶
A telematics device records data continuously, but not all of that data comes from a real driving context. A device left on a workshop bench, manipulated by hand during testing, or temporarily unplugged still emits messages.
The trip-level carrier_state classifies each trip into one of six
contexts:
| State | Description | Vehicle? | Use for analytics? |
|---|---|---|---|
mounted_driving |
Installed in vehicle, vehicle in motion | Yes | Yes |
mounted_idle |
Installed, vehicle stationary | Yes | Yes (ZUPT) |
unplugged |
External power lost | Unknown | Optional |
desk |
Stable surface, no vehicle context | No | No |
handheld |
Being moved by hand | No | No |
unknown |
Insufficient signals | Unknown | No |
Classification combines four signals: external power voltage, GPS speed, accelerometer norm variance, GPS position drift. See SPEC-01 §3.7 for the decision tree.
In the manifest, declare them via trip_carrier_states:
trip_carrier_states:
- trip_id: "T20250410_1053_001"
carrier_state: "mounted_driving"
confidence: "high"
Downstream stages MUST filter on is_vehicle_data == True (i.e.
carrier_state ∈ {mounted_driving, mounted_idle}) for any analytics
that assume vehicle context.