Thursday, December 21, 2017

Kafka Performance Tuning

We all know the power and advantages of KAFKA. Apache Kafka is publish-subscribe messaging system which basically has three major components

KAFKA CONSUMER
KAFKA PRODUCER and
KAFKA BROKER



                                                                                                                                                      

Broker Side Configuration

Producer side 
Compression
Batch size
Sync or Async

Consumer Side configuration
Fetch Size
Applicable for all consumer instance in a consumer group.


Broker Side Configuration

num.replica.fetchers
This configuration parameter defines the number of threads which will be replicating data from leader to the follower. Value of this parameter can be modified as per availability of thread. If we have threads available we should have more number of replica fetchers to complete replication in parallel.

replica.fetch.max.bytes
This parameter is all about how much data you want to fetch from any partition in each fetch request. It’s good to increase value for this parameter so that it helps to create replica fast in the followers

replica.socket.receive.buffer.bytes
In case of less thread available for creating replica, we can increase the size of buffer. It will help to hold more data if replication thread is slow as compared to the incoming message rate.

num.partitions
This is the very important configuration which we should be taken care while having Kafka in live. As many partitions are there, we can have that level of parallelism and write data in parallel which will automatically increase the throughput

Having more partitions slows down performance and throughput if the system OS configuration can’t capable of handle it.
Creating more partitions for a topic depends on available threads and disk.

num.io.threads
Setting value for I/O threads directly depends on how much disk you have in your cluster. These threads are used by server for executing request. We should have at least as many threads as we have disks.

Producer:

Compression
compression.codec
Compression reduces  disk footprint leading to faster reads and writes.Currently Kafka supports - Values are 'none', 'gzip' and 'snappy'
Property values are none, gzip and snappy.

Batch Size
Batch.size measures batch size in total bytes instead of the number of messages. It controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. The default value is 16384.

If you increase the size of your buffer, it might never get full. The Producer sends the information eventually, based on other triggers, such as linger time in milliseconds.

Batch size always confusing what batch size will be optimal. Large batch size may be great to have high throughput but you might feel latency issue in that. So, we can conclude that latency and throughput is inversely proportional to each other.

Async Producers
Publish the message and get the callback to get the acknowledgement of send data status.
'producer.type=1' to make producer async
'queue.buffer.max.ms =duration of match window
'batch.num.messages" = number of messages to be sent in batch.

Large Messages  
Consider placing large files on the standard storage and using Kafka to send a message with the file location. In many cases this can be much faster than using Kafka 
to send the large file itself.