Learn Event Streaming With This Apache Kafka Tutorial (2022)

Need a streaming platform to handle large amounts of data? You’ve undoubtedly heard about Apache Kafka on Linux. Apache Kafka is perfect for real-time data processing, and it’s becoming more and more popular. Installing Apache Kafka on Linux can be a bit tricky, but no worries, this tutorial has got you covered.

In this tutorial, you’ll learn to install and configure Apache Kafka, so you can start processing your data like a pro, making your business more efficient and productive.

Read on and start streaming data with Apache Kafka today!

Table of Contents

Prerequisites

This tutorial will be a hands-on demonstration. If you’d like to follow along, be sure you have the following.

  • A Linux machine – This demo uses Debian 10, but any Linux distribution will work.
  • A non-root user account with sudo privileges, necessary to run Kafka, and named kafka in this tutorial.
  • A dedicated sudo user for Kafka – This tutorial uses a sudo user called kafka.
  • Java – Java is anintegral part of Apache Kafka installation.
  • Git – This tutorial uses Git for downloading the Apache Kafka Unit files.

Installing Apache Kafka

Before streaming data, you’ll first have to install Apache Kafka on your machine. Since you have a dedicated account for Kafka, you can install Kafka without worrying about breaking your system.

1. Run the mkdir command below to create the /home/kafka/Downloads directory. You can name the directory as you prefer, but the directory is called Downloads for this demo. This directory will store the Kafka binaries. This action ensures that all your files for Kafka are available to the kafka user.

mkdir Downloads

2. Next, run the below apt update command to update your system’s package index.

sudo apt update -y

Enter the password for your kafka user when prompted.

Learn Event Streaming With This Apache Kafka Tutorial (1)

3. Run the curl command below to download Kafka binaries from the Apache Foundation website to output (-o) to a binary file (kafka.tgz) in your ~/Downloads directory. You will use this binary file to install Kafka.

Be sure to replace kafka/3.1.0/kafka_2.13-3.1.0.tgz with the latest version of Kafka binaries. As of this writing, the current Kafka version is 3.1.0.

curl "https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz" -o ~/Downloads/kafka.tgz
(Video) Kafka Streams 101: Getting Started
Learn Event Streaming With This Apache Kafka Tutorial (2)

Related:How to Download Files with Python Wget (A Curl Alternative)

4. Now, run the tar command below to extract (-x) the Kafka binaries (~/Downloads/kafka.tgz) into the automatically created kafka directory. The options in the tar command perform the following:

The options in the tar command perform the following:

  • -v – Tells the tar command to list all files as they get extracted.
  • -z – Tells the tar command to gzip the archive as it’s being uncompressed. This behavior is not required in this case but is an excellent option, especially if you need a quick compressed/zipped file to move around.
  • -f – Tells the tar command which archive file to extract.
  • -strip 1 -Instructs the tar command to strip the first level of directories from your file name list. As a result, automatically create a subdirectory named kafka containing all of the extracted files from the ~/Downloads/kafka.tgz file.
tar -xvzf ~/Downloads/kafka.tgz --strip 1
Learn Event Streaming With This Apache Kafka Tutorial (3)

Configuring the Apache Kafka Server

At this point, you have downloaded and installed the Kafka binaries to your ~/Downloads directory. You can’t use the Kafka server just yet since, by default, Kafka does not allow you to delete or modify any topics, a category necessary to organize log messages.

To configure your Kafka server, you will have to edit the Kafka configuration file (/etc/kafka/server.properties).

1. Open the Kafka configuration file (/etc/kafka/server.properties) in your preferred text editor.

2. Next, add the delete.topic.enable = true line at the bottom of the /kafka/config/server.properties file content, save the changes and close the editor.

This configuration property gives you permission to delete or modify topics, so ensure you know what you are doing before deleting topics. Deleting a topic deletes partitions for that topic as well. Any data stored in those partitions are no longer accessible once they are gone.

Be sure there are no spaces at the beginning of each line, or else the file will not be recognized, and your Kafka server will not work.

Learn Event Streaming With This Apache Kafka Tutorial (4)

3. Run the git command below to clone the ata-kafka project to your local machine so that you can modify it for use as a unit file for your Kafka service.

sudo git clone https://github.com/Adam-the-Automator/apache-kafka.git
Learn Event Streaming With This Apache Kafka Tutorial (5)

Now, run the below commands to move into the apache-kafka directory and list the files inside.

(Video) Apache Kafka® 101: Kafka Streams

cd apache-kafkals

Now that you are in the ata-kafka directory, you can see that you have two files inside: kafka.service and zookeeper.service, as shown below.

Learn Event Streaming With This Apache Kafka Tutorial (6)

5. Open the zookeeper.service file in your preferred text editor. You’ll use this file as a reference to create the kafka.service file.

Customize each section below in the zookeeper.service file, as needed. But this demo uses this file as is, without modifications.

  • The [Unit] section configures the startup properties for this unit. This section tells the systemd what to use when starting the zookeeper service.
  • The [Service] section defines how, when, and where to start the Kafka service using the kafka-server-start.sh script. This section also defines basic information such as name, description, and command-line arguments (what follows ExecStart=).
  • The [Install] section sets the runlevel to start the service when entering multi-user mode.
Learn Event Streaming With This Apache Kafka Tutorial (7)

6. Open the kafka.service file in your preferred text editor, and configure how your Kafka server looks when running as a systemd service.

This demo uses the default values that are in the kafka.service file, but you can customize the file as needed. Note that this file is referring to the zookeeper.service file, which you might modify at some point.

Learn Event Streaming With This Apache Kafka Tutorial (8)

7. Run the below command to start the kafka service.

sudo systemctl start kafka

Remember to stop and start your Kafka server as a service. If you don’t, the process will remain in memory, and you can only stop the process by killing it. This behavior can lead to data loss if you have topics that are being written or updated as the process shuts down.

Since you’ve created kafka.service and zookeeper.service files, you can also run either of the commands below to stop or restart your systemd-based Kafka server.

sudo systemctl stop kafkasudo systemctl restart kafka

Related:Controlling Systemd services with Ubuntu systemctl

8. Now, run the journalctl command below to verify that the service has started up successfully.

This command lists all of the logs for the kafka service.

(Video) A Deep Dive into Apache Kafka This is Event Streaming by Andrew Dunnings & Katherine Stanley

sudo journalctl -u kafka

If you’ve configured everything correctly, you’ll see a message that says Started kafka.service, as shown below. Congratulations! You now have a fully-functional Kafka server that will run as systemd services.

Learn Event Streaming With This Apache Kafka Tutorial (9)

Restricting the Kafka User

At this point, the Kafka Service runs as the kafka user. The kafka user is a system-level user and should not be exposed to users who connect to Kafka.

Any client who connects to Kafka through this broker will effectively have root-level access on the broker machine, which is not recommended. To mitigate the risk, you’ll remove the kafka user from the sudoers file and disable the password for the kafka user.

1. Run the exit command below to switch back to your normal user account.

exit
Learn Event Streaming With This Apache Kafka Tutorial (10)

2. Next, run the sudo deluser kafka sudo and press Enter to confirm that you want to remove the kafka user from sudoers.

sudo deluser kafka sudo
Learn Event Streaming With This Apache Kafka Tutorial (11)

3. Run the below command to disable the password for the kafka user. Doing so further improves the security of your Kafka installation.

sudo passwd kafka -l
Learn Event Streaming With This Apache Kafka Tutorial (12)

4. Now, rerun the following command to remove the kafka user from the sudoers list.

sudo deluser kafka sudo
Learn Event Streaming With This Apache Kafka Tutorial (13)

5. Run the below su command to set only authorized users like root users can run commands as the kafka user.

(Video) Apache Kafka in 5 minutes

sudo su - kafka
Learn Event Streaming With This Apache Kafka Tutorial (14)

6. Next, run the below command to create a new Kafka topic named ATA to verify that your Kafka server is running correctly.

Kafka topics are feeds of messages to/from the server, which helps eliminate the complications of having messy and unorganized data in the Kafka Servers

cd /usr/local/kafka-server && bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic ATA
Learn Event Streaming With This Apache Kafka Tutorial (15)

7. Run the below command to create a Kafka producer using the kafka-console-producer.sh script. Kafka producers write data to topics.

echo "Hello World, this sample provided by ATA" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic ATA > /dev/null
Learn Event Streaming With This Apache Kafka Tutorial (16)

8. Finally, run the below command to create a kafka consumer using the kafka-console-consumer.sh script. This command consumes all of the messages in the kafka topic (--topic ATA) and then prints out the message value.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic ATA --from-beginning

You’ll see the message in the output below because your messages are printed by the Kafka console consumer from the ATA Kafka topic, as shown below. The consumer script continues to run at this point, waiting for more messages.

You can open another terminal to add more messages to your topic and press Ctrl+C to stop the consumer script once you are done testing.

Learn Event Streaming With This Apache Kafka Tutorial (17)

Conclusion

Throughout this tutorial, you’ve learned to set up and configure Apache Kafka on your machine. You’ve also touched on consuming messages from a Kafka topic produced by the Kafka producer, resulting in effective event log management.

Now, why not build on this newfound knowledge by installing Kafka with Flume to better distribute and manage your messages? You can also explore Kafka’s Streams API and build applications that read and write data to Kafka. Doing so transforms data as needed before writing it out to another system like HDFS, HBase, or Elasticsearch.

FAQs

How much time will it take to learn Apache Kafka? ›

That will give you an overview of the motivation behind the design choices and what makes Kafka efficient. It is also a very engaging read if you are interested in systems. It will get you started very quickly and allow you learn about the most important concepts in less than two hours.

What is event streaming in Kafka? ›

Event streaming is the digital equivalent of the human body's central nervous system, and Kafka is the technology that powers this nervous system. It is the foundation for the 'always-on' world, where businesses are increasingly software-defined and automated, and where the user of software is more software.

What is Apache Kafka tutorial? ›

Apache Kafka Tutorial provides the basic and advanced concepts of Apache Kafka. This tutorial is designed for both beginners and professionals. Apache Kafka is an open-source stream-processing software platform which is used to handle the real-time data storage.

Is Kafka good for video streaming? ›

Kafka is an event streaming platform for messaging, storage, processing, and integration at scale in real-time with zero downtime or data loss. Kafka is often used as a central streaming integration layer with these characteristics.

Is Kafka good skill to learn? ›

Is it worth learning Kafka? Yes, Kafka is one of the most demanded skills in the field of data collection and processing. Around 80-90 percent of the Fortune 100 companies use Kafka for data collection, processing and storing the collected data, and analyzing data at scale.

Is Kafka tough to learn? ›

Developers new to Apache Kafka might find it difficult to grasp the concept of Kafka brokers, clusters, partitions, topics, logs, and so on. The learning curve is steep. You'll need extensive training to learn Kafka's basic foundations and the core elements of an event streaming architecture.

How do I become a Kafka expert? ›

  1. Confluent Certification.
  2. Get in touch about the Certification at certification@confluent.io.
  3. Using Event Modeling to Architect Event-Driven Information Systems ft. ...
  4. Learn Apache Kafka to build and scale modern applications.
  5. Project Metamorphosis.
  6. Join the Confluent Community Slack.
28 Dec 2020

How do I start Kafka? ›

Starting the Kafka server
  1. Open the folder where Apache Kafka is installed. cd <MDM_INSTALL_HOME> /kafka_ <version>
  2. Start Apache Zookeeper. ./zookeeper-server-start.sh ../config/zookeeper.properties.
  3. Start the Kafka server. ./kafka-server-start.sh ../config/server.properties.
18 Jul 2022

How popular is Kafka? ›

Apache Kafka is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale. Most known for its excellent performance, low latency, fault tolerance, and high throughput, it's capable of handling thousands of messages per second.

Videos

1. Streaming Machine Learning with Apache Kafka and TensorFlow
(Kai Wähner)
2. What are Event Streams | Kafka for Event Streaming | Event Stream Architectures
(Learning Journal)
3. Apache Kafka End to End Explanation | Real time streaming tutorial
(AI and Data Engineering)
4. Creating Kafka Streams Application | Kafka Stream Quick Start | Introduction to Kafka Streams API
(Learning Journal)
5. Event Streaming with Kafka Streams | Apache Kafka® Platform
(Viktor Gamov)
6. Apache Kafka with Spring Boot By Mr. Ashok | Ashok IT
(Ashok IT)

Top Articles

Latest Posts

Article information

Author: Roderick King

Last Updated: 01/01/2023

Views: 6181

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Roderick King

Birthday: 1997-10-09

Address: 3782 Madge Knoll, East Dudley, MA 63913

Phone: +2521695290067

Job: Customer Sales Coordinator

Hobby: Gunsmithing, Embroidery, Parkour, Kitesurfing, Rock climbing, Sand art, Beekeeping

Introduction: My name is Roderick King, I am a cute, splendid, excited, perfect, gentle, funny, vivacious person who loves writing and wants to share my knowledge and understanding with you.