Programming Gotchas and Tips

  1. DataTurbine is divided into Sources, Sinks and Plugins.
    1. Sources generate data (writers)
    2. Sinks consume data (readers)
    3. Plugins operate on data and produce derived output, similar to a Unix pipe.
  2. Names in the turbine are hierarchical. The simplest is Source/Channel where a source can have more than one channel, but a channel belongs to exactly one source.
    1. If you have parent-child routing, you will see Parent/Child/Source/Channel
    2. Loops in the graph are very possible due to shortcuts and mirroring. Tree traversal algorithms should expect and deal with cycles.
  3. If you connect with a duplicate Source name, DataTurbine will silently give you an auto-incremented name, e.g. Source_1. However, if you specify “append” when connecting, and you are the only instance, it’ll reconnect to the old data and append.
  4. All sources should call Detach before Close, otherwise your data will vanish when you disconnect. This loss is rarely the desired behaviour.
  5. Server names can be a TCP name or user-defined from the command line via the -n argument.
  6. Sources can be in several native types – INT16/32, float, double, string or blob.
    1. Numeric data is usually stored as float or double (e.g. instrument data)
    2. Audio data, currently experimental, is stored as INT16
    3. Event markers are XML, stored as string type
    4. Video is actually discrete JPGs, stored as binary blobs, one per image
  7. Time synchronization is critical – all machines must be NTP-synced. That includes the people running data viewers like RDV! See this writeup for more details.
  8. Timestamps are stored as doubles, with 32 bits for integer time_t and 32 bits for fractional seconds.
  9. RDV can’t display data sampled faster than 1khz at present.
  10. Metadata in DataTurbine is split into two types.
    1. Invariant. This are things that are set once and never change, for example units. These are set using the PutUserInfo call, where the contents are “name=value,name2=value2…” pairs in a string.
    2. Time-varying. These are stored as normal source feeds. An example would be GPS position of a datalogger.
  11. Each and every source defines its own cache parameters. When a source connects, you specify cache and archive size. This allows you to tune server usage, source by source.
  12. Source data is aggregated into ‘frames’, which are pushed to the server via the Flush call. You can aggregate (buffer) data to make larger packets, reduce server/network load and increase efficiency.
  13. DataTurbine exposes its internal metrics as ‘hidden’ channels in the _Metrics folder – memory usage, bandwidth, disk used and more. Very useful.
  14. It also sends its logfiles out as hidden text channels in the _Logs source.
  15. Think of DataTurbine as a very robust abstraction layer: Once data is sent, no sink need worry what kind of device it came from, or how it got there. Enforced device abstaction, network transparency and more!
  16. The DataTurbine server has IP-level access control (read, write) similar to /etc/hosts.deny. There’s also a currently-unused mechanism for requiring passwords for sources, but this really needs encryption to be useful and secure.
  17. Getting data into DataTurbine is often the easy part. Once there, you need a good viewer that lets users interact with the data in ways that they find useful.
  18. There are many clients (sinks) as well as DataTurbine->SQL code, file writers, etc so you can use existing tools.
  19. The ChannelMap.PutDataAsXXXX calls do not copy the data. They just save a reference to it, so be sure toleave the data in a valid variable until you call Flush. Otherwise multiple channels will have the exact same value, very puzzling.
Open Source DataTurbine Initiative © 2017 Frontier Theme