Skip to content

Concepts

The five ideas you need to grok to read Telemachus data correctly.

Telemachus — functional groups

Telemachus is a flat parquet (the schema is columnar, not nested), but mentally it splits into five functional groups. Knowing these groups makes it much easier to remember why each column is there.

Telemachus = datetime       ts
   + GPS            lat, lon, speed_mps, heading_deg,
                    altitude_gps_m, hdop, n_satellites
   + IMU
       ├── accel    ax_mps2, ay_mps2, az_mps2
       ├── gyro     gx_rad_s, gy_rad_s, gz_rad_s   (optional)
       └── magneto  mx_uT,    my_uT,    mz_uT      (optional)
   + OBD / CAN      ignition, odometer_m, rpm,
                    speed_obd_mps, fuel_*, …       (optional)
   + extra          x_<source>_<field>             (vendor-specific)
Group What it tells you Rate (typical)
datetime When the sample was captured IMU rate (10 Hz)
GPS Where the vehicle is and how fast 1 Hz (NaN between fixes)
IMU How the vehicle moves (acc/rot/field) 10–100 Hz
OBD/CAN What the vehicle reports (bus data) 1 Hz (varies)
extra Anything vendor-specific that doesn't fit the above varies

Why flat columns, not nested structs?

Parquet handles flat columns best (projection pushdown, fast scans). Nesting imu.accel.x_mps2 looks tidy but costs perf and tooling compatibility. The mental model is nested; the schema is flat.

Vendor-specific extra fields

When a vendor exposes a field that has no standard Telemachus equivalent (a proprietary counter, a device-internal flag, …), use the x_<source>_<field> naming convention:

Column Meaning
x_teltonika_ext_voltage_v Teltonika external power voltage reading
x_geotab_geofence_id Geotab-specific geofence identifier
x_danlaw_codec_id Danlaw firmware codec tag

The x_ prefix signals "not part of the normative Telemachus contract, consumer may safely ignore it". The <source> segment keeps names unambiguous across datasets that merge multiple vendors.

Telemachus record format — the layered model

Layer Name Input Output
Telemachus Device Hardware Raw parquet — only what the device measures
enriched Cleaned & Contextualised Telemachus Enriched Telemachus + map matching, DEM, IMU calibration, signal quality
events layer Events & Situations enriched enriched + event column + event table (harsh brake, pothole, curve, …)

The Telemachus spec is normative on Telemachus (SPEC-01). enriched and events layer column contracts are documented in SPEC-01 §4 but their algorithms are intentionally out of scope — different consumers can compute enriched/events layer differently as long as they emit conformant columns.

Rule of thumb: a column derived from external data (maps, DEM, algorithm output) belongs to enriched format or higher, never Telemachus.

Multi-rate IMU vs GNSS

Most devices stream IMU at 10 Hz and GNSS at 1 Hz. Telemachus is timestamped at the IMU rate, with GNSS columns containing NaN between fixes:

ts                    lat      lon       speed_mps  ax_mps2  ay_mps2  az_mps2
2025-01-01T08:00:00.0 49.3347  1.3830    5.2        0.12     0.03     9.81
2025-01-01T08:00:00.1 NaN      NaN       NaN        0.15    -0.01     9.80
2025-01-01T08:00:00.2 NaN      NaN       NaN        0.11     0.02     9.82
2025-01-01T08:00:01.0 49.3348  1.3831    5.3        0.13     0.01     9.81

When computing GNSS-only metrics (distance, average speed), drop NaNs explicitly. When computing IMU-only metrics (jerk, vibration), use all rows.

The manifest sensors.{gps,accelerometer}.rate_hz declares the expected rates so consumers can pre-allocate buffers and pick interpolation strategies.

AccPeriod — the accelerometer frame

The same physical accelerometer can output data in different reference frames depending on firmware state:

Frame At rest Behaviour
raw \|a\| ≈ 9.81 m/s² Unprocessed sensor output
compensated \|a\| ≈ 0 m/s² Firmware has subtracted gravity
partial 0 < \|a\| < g Imperfect compensation

This matters because downstream stages (IMU calibration, event detection) need to know whether gravity is in the signal.

The manifest declares one or more acc_periods segments — each a contiguous time range with a coherent frame:

acc_periods:
  - start: 2025-01-01T00:00:00Z
    end:   2025-03-15T12:00:00Z
    frame: compensated
    detection_method: empirical
  - start: 2025-03-15T12:00:01Z
    end:   present
    frame: raw
    detection_method: profile_change

Default if absent: a single implicit period with frame: "raw". See SPEC-01 §3.6 for the full normative definition.

CarrierState — is this trip real driving?

A telematics device records data continuously, but not all of that data comes from a real driving context. A device left on a workshop bench, manipulated by hand during testing, or temporarily unplugged still emits messages.

The trip-level carrier_state classifies each trip into one of six contexts:

State Description Vehicle? Use for analytics?
mounted_driving Installed in vehicle, vehicle in motion Yes Yes
mounted_idle Installed, vehicle stationary Yes Yes (ZUPT)
unplugged External power lost Unknown Optional
desk Stable surface, no vehicle context No No
handheld Being moved by hand No No
unknown Insufficient signals Unknown No

Classification combines four signals: external power voltage, GPS speed, accelerometer norm variance, GPS position drift. See SPEC-01 §3.7 for the decision tree.

In the manifest, declare them via trip_carrier_states:

trip_carrier_states:
  - trip_id: "T20250410_1053_001"
    carrier_state: "mounted_driving"
    confidence: "high"

Downstream stages MUST filter on is_vehicle_data == True (i.e. carrier_state ∈ {mounted_driving, mounted_idle}) for any analytics that assume vehicle context.