When you are creating a distributed event driven system, you have to carefully think about observability.
- It is because such systems are more complex due to their asynchronous nature. Therefore
There is less control over the processing flows.
- It is more difficult to test and debug.
This requires to think carefully about observability infrastructure with providing logging, metrics and trace information, so you know that your system is running properly.
In this video🎞️👇 I share with you some insights, what you can do to add observability to your DDS system.
When you are creating a system with DDS middleware you are creating a peer to peer system and when something doesn’t work, it’s more difficult to find a bug. It is because there is no central place where it would be possible to get a central log.
The solution is to create a central observability center and use a logging and monitoring topics with metrics that publish the essential information about your system state.
DDS provides health check out of the box, because of the discovery functionality, but for other information about middleware performance and effectiveness it is not enough. But.
DDS vendors provide plugins and tools that allow you to subscribe to logging and monitoring topics, and you can observe the state of your system with metrics like latency, throughput, bytes sent, bytes lost and discovery information. It is vendor specific.
RTI’s Connext has Distributed Logger, RTI Monitor, RTI AdminConsole. Eprosima’s FastDDS has Statistics module and FastDDS Monitor.
In cases where the vendor plugins and tools does not provide desired information, then you will have to create your own support for logging, metrics or tracing.
So nothing prevents you using metrics, logs and tracing information that you would use in µService world and utilize Prometheus, Grafana, and others like ELK stack, Jaeger, Zipkin and others and their combination and leverage tracing standards like OpenTelemetry.
The difference is, that with µServices you would use probably the sidecar pattern to add such functionality, in DDS you will generally use plugins and adapters or you could also use a gateway that will allows you to integrate different types of connectivity protocols with DDS.
For example you can utilize Telegraf to inject observability data to InfluxDB or Kafka.
That is my take on observability.