Build a real-time data capability through a Kafka message backbone in AWS
Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000). It’s at the heart of a movement towards managing and processing streams of data.
Kafka often gets used in the real-time streaming data architectures to provide real-time analytics. Since Kafka is a fast, scalable, durable, and fault-tolerant publish- subscribe messaging system, Kafka is used in use cases where JMS and RabbitMQ may not even be considered due to volume and responsiveness. Kafka has higher throughput, reliability and replication characteristics which make it applicable for things like tracking service calls (tracks every call) or track IOT sensors data where a traditional MOM might not be considered.
In this Post, I will walk through the process of installing Kafka on Linux Ubuntu in AWS and start running some simple examples. I will cover how Kafka works and some use cases using Java and Node in future posts.
Sign in to your AWS account or create a new one if you don’t have one. You won’t be charged if you stay within free tier services. Find more about Free Tier here.
Let’s generate a Key Pair. Go to EC2->Key Pairs. Key Pair will allow you to SSH and connect to your EC2 instance. Select your preferred Region and Click on Create Pair or Import an already existing one.
Download your Key Pair. Make sure to restrict permissions on your file:
chmod 400 <key pair name>.pem
Provision your Instance
Let’s provision the EC2 instance with an Ubuntu image. Go to the EC2 Dashboard and click Launch Instance. Select the Ubuntu image.
Leave all the options in Default and click Review and Launch.
Please note that the official recommendation for Kafka is a minimum t2.medium, but this option is out of free tier range and will cost you money, so for this exercise we will use t2.micro.
Select your key pair generated on the previous step.
Go Back to the Instances list and you should see your new EC2 Instance launched.
Connect to your Instance
Click on the instance and on the Decription tab copy the Public DNS (IPv4)
If you are on Windows Download Putty and connect to your instance.
If you are on Mac connect to your instance using the command below:
ssh -i ~/.ssh/<key pair name.pem> ubuntu@<Public DNS (IPv4)>
You should see the Ubuntu Welcome screen:
Great!! You are connected to your EC2 instance
Let’s begin the installation. The first step is to update the instance and install Java.
Run the following commands:
sudo apt-get update sudo apt upgrade
sudo add-apt-repository ppa:linuxuprising/java sudo apt install openjdk-11-jre-headless
Test your Java installation:
Now, let’s install the Confluent Kafka with the following commands:
wget -qO - https://packages.confluent.io/deb/5.2/archive.key | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.2 stable main" sudo apt-get update && sudo apt-get install confluent-platform-2.12
Congratulations!! You now have a Kafka server installed on Ubuntu in AWS.
Let’s configure Kafka for a minimum install with 1 Kafka Broker, 1 Topic.
Navigate to your Kafka installation
Edit the file server.properties
sudo nano /etc/kafka/server.properties
Uncomment the following lines:
metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter confluent.metrics.reporter.bootstrap.servers=localhost:9092 confluent.metrics.reporter.topic.replicas=1
Save with Ctrl+0 and Ctrl+X to Exit
Edit the file control-center-production.properties
sudo nano /etc/confluent-control-center/control-center.properties
Uncomment the following lines and edit to your server name or localhost:
Add the following lines at the end of the file:
confluent.controlcenter.internal.topics.partitions=1 confluent.controlcenter.internal.topics.replication=1 confluent.controlcenter.command.topic.replication=1 confluent.monitoring.interceptor.topic.partitions=1 confluent.monitoring.interceptor.topic.replication=1
sudo nano /etc/kafka/connect-distributed.properties
Add the following lines at the end of the file:
Let’s start the Kafka Services:
sudo systemctl start confluent-zookeeper sudo systemctl start confluent-kafka sudo systemctl start confluent-schema-registry sudo systemctl start confluent-kafka-connect sudo systemctl start confluent-kafka-rest sudo systemctl start confluent-ksql sudo systemctl start confluent-control-center
You can check service status with this command:
systemctl status confluent*
The services will be running on the following ports:
|Kafka brokers (plain text)||9092|
|Confluent Control Center||9021|
|Kafka Connect REST API||8083|
|KSQL Server REST API||8088|
|Schema Registry REST API||8081|
To access your Control Center, make sure to open port 9021 in AWS.
Go to your instance Description tab and click on security group created. Include a new Inboud Rule for Port 9021. You can limit to your IP or leave accessible to all.
Navigate to your <Public DNS (IPv4)>:9021 and you should see your Control Center.