Naresh Blog: December 2017

We all know the power and advantages of KAFKA. Apache Kafka is publish-subscribe messaging system which basically has three major components

KAFKA CONSUMER

KAFKA PRODUCER and

KAFKA BROKER

Broker Side Configuration

Producer side

Compression

Batch size

Sync or Async

Consumer Side configuration

Fetch Size

Applicable for all consumer instance in a consumer group.

Broker Side Configuration

num.replica.fetchers

This configuration parameter defines the number of threads which will be replicating data from leader to the follower. Value of this parameter can be modified as per availability of thread. If we have threads available we should have more number of replica fetchers to complete replication in parallel.

replica.fetch.max.bytes

This parameter is all about how much data you want to fetch from any partition in each fetch request. It’s good to increase value for this parameter so that it helps to create replica fast in the followers

replica.socket.receive.buffer.bytes

In case of less thread available for creating replica, we can increase the size of buffer. It will help to hold more data if replication thread is slow as compared to the incoming message rate.

num.partitions

This is the very important configuration which we should be taken care while having Kafka in live. As many partitions are there, we can have that level of parallelism and write data in parallel which will automatically increase the throughput

Having more partitions slows down performance and throughput if the system OS configuration can’t capable of handle it.

Creating more partitions for a topic depends on available threads and disk.

num.io.threads

Setting value for I/O threads directly depends on how much disk you have in your cluster. These threads are used by server for executing request. We should have at least as many threads as we have disks.

Producer:

Compression

compression.codec

Compression reduces disk footprint leading to faster reads and writes.Currently Kafka supports - Values are 'none', 'gzip' and 'snappy'

Property values are none, gzip and snappy.

Batch Size

Batch.size measures batch size in total bytes instead of the number of messages. It controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. The default value is 16384.

If you increase the size of your buffer, it might never get full. The Producer sends the information eventually, based on other triggers, such as linger time in milliseconds.

Batch size always confusing what batch size will be optimal. Large batch size may be great to have high throughput but you might feel latency issue in that. So, we can conclude that latency and throughput is inversely proportional to each other.

Async Producers

Publish the message and get the callback to get the acknowledgement of send data status.

'producer.type=1' to make producer async

'queue.buffer.max.ms =duration of match window

'batch.num.messages" = number of messages to be sent in batch.

Large Messages

Consider placing large files on the standard storage and using Kafka to send a message with the file location. In many cases this can be much faster than using Kafka

to send the large file itself.

Naresh Blog

Thursday, December 21, 2017

Kafka Performance Tuning