CERN IT Monitoring: metrics, logs and beyond

  • Data: 18 ottobre 2019 dalle 14:00 alle 16:00

  • Luogo: Sala Venturi, c/o sede CNAF, viale Berti Pichat, 6/2.

Contatto di riferimento:

Partecipanti: Luca Magnoni: CERN

Abstract.

In recent years the CERN IT monitoring infrastructure, which provides monitoring facilities to the CERN Data Centre and to WLCG services, has gone through a major redesign phase.

A number of legacy tools (e.g. Lemon, Experiment Dashboards) have been decommissioned in favor of a modern technology stack based on open-source products. Today more than 40 thousand hosts and more than

150 IT Services are successfully monitored using the new data pipeline approach, with Apache Kafka as core transport layer, different data-gathering agents (e.g. Collectd, Prometheus, Logstash, HTTP) and several storage backends (e.g. Elasticsearch, InfluxDB, HDFS), with Grafana as main visualization tool (more than two thousand dashboards and more than one million queries per day).

The new architecture not only has proved to scale well beyond design

(3.5 TB/day of compressed metrics and logs), but it has also been successfully adopted by users doing on-the-fly data processing and analytics, with streaming technology like Apache Spark and Kafka Streams.

In this seminar we will explore the architecture of the infrastructure, the technical challenges and the lessons learned.