Connect with us


Kafka UI Monitoring Tools: Key Kafka Metrics for Monitoring Performance



Kafka UI Monitoring Tools

Apache Kafka is a distributed event streaming platform that allows businesses to build and manage high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Thousands of businesses use it, with 80 percent of Fortune 100 corporations using it. You should also learn Apache Kafka before proceeding with further technology.

Despite the fact that Apache Kafka is the go-to service for scenarios needing real-time data processing and application activity tracking, cluster monitoring and management can be difficult.

You may need third-party, open-source, or commercial graphical tools with additional administrative and monitoring functions to make these processes more efficient and visible. The 8 monitoring UI tools for Kafka are:

  • AKHQ
  • Kowl
  • Conduktor
  • Confluent CC
  • CMAK
  • Lenses
  • Kafdrop
  • UI for Apache Kafka

How does Kafka work?

Kafka is made up of servers and clients that connect via the TCP network protocol, which is very fast. Kafka runs on one or more servers in a cluster. Data is stored on a few of these servers. To integrate Kafka with your current systems, other servers import and export data as streams of events. The Kafka cluster is extremely scalable and dependable. If one of the servers fails, the others step in to keep the system running without losing data. Clients allow you to create distributed applications and microservices that receive, write, and process streams of events in a fault-tolerant, parallel, and scalable manner.

Important features of Kafka

  1. Broker: All client requests are handled by the broker, and data is stored. One or more brokers can be found in a cluster.
  2. ZookeeperThe cluster’s state is maintained by Zookeeper.
  3. Producer: The broker receives the records that the producer has sent.
  4. Consumer: The broker sends the consumer batches of records.

Advantages of using Kafka

  • Exceptional throughput. Kafka can process hundreds of messages per second and can handle massive amounts of data at rapid speeds.
  • There is low latency. With a latency of milliseconds, Kafka can process messages.
  • Fault-tolerant. It is one of Kafka’s most significant advantages. Even if a node or machine in the cluster fails, Kafka continues to function normally.
  • Durability. One of the reasons for Kafka’s durability is that it has a message replication function, which ensures that messages are never lost.
  • Scalability. By adding more nodes, Kafka can be scaled up on the go.
  • The architecture is distributed. By exploiting characteristics including replication and partitioning, Kafka’s distributed architecture makes it scalable.
  • Consumers will appreciate the convenience. Depending on the customer with whom Kafka interacts, it can work in a variety of ways.

You must regularly monitor Kafka’s state and efficiency to ensure the stable operation of applications that rely on it. Also, you must know that Amazon MSK collects Kafka metrics and sends them to Amazon Cloud watch where you can see them.  To do so, keep an eye on the following key metrics for each component in the cluster:

  • Broker metrics
  • Consumer metrics
  • Zookeeper metrics
  • Producer metrics


Before being used, every message must pass through the broker. As a result, in Kafka, brokers play a crucial role. It’s critical to keep track of their performance traits, which fall into three categories:

  1. Kfka System Metrics
  2. JVM garbage collector metrics
  3. Host metrics

Kafka System Metrics

Under Replicated Partitions: The total number of partitions on the broker that are under-replicated. Under-replicated partitions are a warning sign that one or more brokers are unavailable.

Isr Shrinks Per Sec/Isr Expands Per Sec: In-sync replicas ISRs for some partitions shrink if a broker goes down. ISRs are increased once the replicas are fully caught up once that broker is up and running again.

Active Controller Count: It indicates whether the broker is active and should always be set to 1 because there is only one controller at any given time.

Offline Partitions Count: The number of partitions that are not writable or readable because they lack an active leader. Brokers are not available if the value is non-zero.

Leader Election Rate And Time Ms: When ZooKeeper is unable to connect to the leader, a partition leader election occurs. This indicator could indicate that a broker is now unavailable.

Unclean Leader Elections Per Sec: If the broker who is the partition’s leader is unavailable and a new leader needs to be elected, out-of-sync replicas can be used to choose a leader. This measure can indicate the possibility of communication being lost.

Total Time Ms: The amount of time it takes to process a message.

Purgatory Size: The number of purgatory requests received. Can assist in determining the root causes of the delay.

Bytes In PerSec /Bytes Out PerSec: The difference between the number of data brokers received from producers and the number of data brokers read from consumers. This is a measure of the Kafka cluster’s overall throughput or workload.

Requests Per Second: Requests from manufacturers, customers, and subscribers are common.

JVM Garbage Collector Metrics:

Collection Count: The total amount of garbage collection processes done by the JVM, whether new or old.

Collection Time: The total amount of time spent by the JVM performing young or old garbage collection processes in milliseconds.

Host Metrics: 

Page Cache reads Ratio: The number of reads from cache pages divided by the number of reads from the disc.

Use of the hard drive: The difference between the quantity of consumed and available disc space.

CPU utilization: Performance problems are almost never caused by the CPU. This measure should be explored if you notice spikes in CPU consumption.

Bytes sent/received over the network: The total volume of network traffic, both incoming and outgoing.


Processes that send messages to customers are known as producers. Consumers will not receive new messages if producers stop functioning. Let’s take a look at some of the most important producer metrics.

compression-rate-avg: Compression rate of sent batches on average.

Response-rate: Per producer, the average number of responses received.

Request-rate: Per producer, the average number of responses sent.

Request-latency-avg: In milliseconds, the average request latency.

Outgoing-byte-rate: The average number of bytes sent out each second.

Io-wait-time-ns-avg: The average time spent waiting for a socket by the I/O thread (in ns).

Batch-size-avg: Per request, the average number of bytes transmitted per partition.


Consumer metrics can reveal how quickly data is retrieved by customers, which can aid in the detection of system performance issues. Let’s take a look at some consumer statistics.

Records-lag: On this partition, the consumer has more messages than the producer.


Record-lag-max: Maximum time between records. The increased value indicates that the consumer is falling behind the producers.

Bytes-consumed-rate: For each customer, the average bytes consumed per second for a single topic or across all topics.

Records-consumed-rate: The average number of records ingested per second across all topics or for a given topic.

Fetch-rate: The number of fetch requests made by the customer per second.


ZooKeeper is a critical component of Kafka’s deployment, and turning it off will bring Kafka to a halt. ZooKeeper keeps track of brokers and Kafka themes, as well as quotas that limit the amount of traffic that passes through the cluster. It also keeps track of clones. The ZooKeeper stats are listed below.

Outstanding requests: The total number of requests in the queue.

Avg-latency: In milliseconds, the time it takes for a server to respond to a client request.

Num-alive-connections: The number of clients who are connected to ZooKeeper.

Followers: The number of people who are actively following you on Twitter.

Pending-syncs: open-file-descriptor-count is the number of pending consumer syncs.

The number of file descriptors that have been used.


We looked at how the Kafka event streaming platform works and the advantages of adopting it in this article. We also looked at Kafka’s performance metrics and tools in further depth.



Also Check:

iPhone 13 Discount on Flipkart: Enjoy Discounts and Exchange Offers Today

Why is SQL One of The Key Skills to Become a Data Analyst?

Difference between SEO and SMO


Salman Ahmad is a seasoned writer for CTN News, bringing a wealth of experience and expertise to the platform. With a knack for concise yet impactful storytelling, he crafts articles that captivate readers and provide valuable insights. Ahmad's writing style strikes a balance between casual and professional, making complex topics accessible without compromising depth.

Continue Reading