Multicast and DataTurbine

Multicasting (see Wikipedia entry) is a network technology that most commonly allows a single source to send to many listeners who can join at will. It spreads out the effort of sending to many listeners by moving routing logic out of the application and onto the routers. As listeners add themselves, the routers construct a spanning tree and determine how to move the data. The most commonly used multicast is transported over UDP, most often for loss-tolerant, time-critical payloads such as audio and video. Conceptually, multicast as described sounds a lot like DataTurbine, so the obvious question is whether it can be harnessed to increase the scalability and usefulness of the DataTurbine.

There are several significant technical barriers, which derive from the differing designs of the two. DataTurbine operates where a client

  • Connects
  • (Optionally) requests a list of the existing channels
  • Subscribes to some subset of them
  • (Optionally) TiVos through the data, perhaps jumping around

Multicast is quite a bit different. Designed to the live broadcast model, it commonly looks like this:

  • Multicast IP address is setup offline and distributed via webpage or similar
  • Server starts up
  • Clients connect to multicast address, and routing tree is built on the fly based on BGP+ or similar

Note that this does not account for

  • Streaming a subset of the data. You could accommodate this by creating different multicast addresses on the fly for each requested subset, but that has its own issues and is quite complex.
  • Viewing of older data. Multicast is one-way, so you need a separate command channel for the clients to send requests to the server.
  • Reliable transport. Since multicast is UDP, you’d have to replicate TCP or use the less-common TCP multicast to get the every-packet semantics expected by DataTurbine.

Additionally, there are deployment issues both technical and political. Most edge routers do not have multicast enabled, and there are significant risks with differing versions of their software as well. These are solvable, but doing so requires significant time, communications with network administrators, and a suite of sophisticated network analysis and debugging tools to derive the state of the dynamic network as it is constructed. Alternately, it can be quite useful on private networks under a single administrative domain, and has seen success in applications such as financial market data within a financial company.

At present, DataTurbine has programs that

  • Repeat data from DataTurbine out over UDP, unicast or multicast
  • Receive UDP data and insert it into DataTurbine

We have also evaluated the idea of using multicast for server mirroring, which would allow a single server to mirror its entire contents to more than one other server. There are some reasons why this is of limited utility:

  • Mirroring is done on a per-source, not a per server basis, so the multicast should match those semantics. Certainly do-able.
  • As above, we’d need TCP-like transport guarantees
  • Lastly, there is not at present a need from our user communities for this.

We would be interested in using the exising UDPCast program to exercise this idea, as we could easily create a server mirror by sending out over UDPCast to a multicast network, which could receive the data via the UDP receiver. Hopefully we can find a user community that would be enhanced by such a topology.

More reading on experiences with multicast deployments can be found in the papers attached to this page.

Open Source DataTurbine Initiative © 2016 Frontier Theme