Build a real-time data capability through a Kafka message backbone in AWS

Install a Kafka Cluster on Ubuntu in AWS

How to install a Confluent Kafka Cluster on Linux Ubuntu in AWS

Jul 16, 2019

  • Kafka
  • AWS

  • Introduction

    Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000). It’s at the heart of a movement towards managing and processing streams of data.

    Kafka often gets used in the real-time streaming data architectures to provide real-time analytics. Since Kafka is a fast, scalable, durable, and fault-tolerant publish- subscribe messaging system, Kafka is used in use cases where JMS and RabbitMQ may not even be considered due to volume and responsiveness. Kafka has higher throughput, reliability and replication characteristics which make it applicable for things like tracking service calls (tracks every call) or track IOT sensors data where a traditional MOM might not be considered.

    In this Post, I will walk through the process of installing Kafka on Linux Ubuntu in AWS and start running some simple examples. I will cover how Kafka works and some use cases using Java and Node in future posts.

    Getting Started

    Sign in to your AWS account or create a new one if you don’t have one. You won’t be charged if you stay within free tier services. Find more about Free Tier here.

    Let’s generate a Key Pair. Go to EC2->Key Pairs. Key Pair will allow you to SSH and connect to your EC2 instance. Select your preferred Region and Click on Create Pair or Import an already existing one.

    Download your Key Pair. Make sure to restrict permissions on your file:

    chmod 400 <key pair name>.pem
    

    Provision your Instance

    Let’s provision the EC2 instance with an Ubuntu image. Go to the EC2 Dashboard and click Launch Instance. Select the Ubuntu image.

    Leave all the options in Default and click Review and Launch.
    Please note that the official recommendation for Kafka is a minimum t2.medium, but this option is out of free tier range and will cost you money, so for this exercise we will use t2.micro.

    Select your key pair generated on the previous step.

    Go Back to the Instances list and you should see your new EC2 Instance launched.

    Connect to your Instance

    Click on the instance and on the Decription tab copy the Public DNS (IPv4)

    • If you are on Windows Download Putty and connect to your instance.

    • If you are on Mac connect to your instance using the command below:

    ssh -i ~/.ssh/<key pair name.pem> ubuntu@<Public DNS (IPv4)>
    

    You should see the Ubuntu Welcome screen:


    Great!! You are connected to your EC2 instance :smile:

    Install Java

    Let’s begin the installation. The first step is to update the instance and install Java.
    Run the following commands:

    sudo apt-get update
    sudo apt upgrade
    
    sudo add-apt-repository ppa:linuxuprising/java
    sudo apt install openjdk-11-jre-headless
    


    Test your Java installation:

    Install Kafka

    Now, let’s install the Confluent Kafka with the following commands:

    wget -qO - https://packages.confluent.io/deb/5.2/archive.key | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.2 stable main"
    sudo apt-get update && sudo apt-get install confluent-platform-2.12
    

    Congratulations!! You now have a Kafka server installed on Ubuntu in AWS.

    Configure Kafka

    Let’s configure Kafka for a minimum install with 1 Kafka Broker, 1 Topic.
    Navigate to your Kafka installation

    Kafka Broker

    Edit the file server.properties

    sudo nano /etc/kafka/server.properties
    

    Uncomment the following lines:

    metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter
    confluent.metrics.reporter.bootstrap.servers=localhost:9092
    confluent.metrics.reporter.topic.replicas=1
    

    Save with Ctrl+0 and Ctrl+X to Exit

    Control Center

    Edit the file control-center-production.properties

    sudo nano /etc/confluent-control-center/control-center.properties
    


    Uncomment the following lines and edit to your server name or localhost:

    bootstrap.servers=localhost:9092
    zookeeper.connect=localhost:2181
    


    Add the following lines at the end of the file:

    confluent.controlcenter.internal.topics.partitions=1
    confluent.controlcenter.internal.topics.replication=1
    confluent.controlcenter.command.topic.replication=1
    confluent.monitoring.interceptor.topic.partitions=1
    confluent.monitoring.interceptor.topic.replication=1
    

    Connector

    sudo nano /etc/kafka/connect-distributed.properties
    

    Add the following lines at the end of the file:

    consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
    producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
    


    Start Kafka

    Let’s start the Kafka Services:

    sudo systemctl start confluent-zookeeper
    sudo systemctl start confluent-kafka
    sudo systemctl start confluent-schema-registry
    
    sudo systemctl start confluent-kafka-connect
    sudo systemctl start confluent-kafka-rest
    sudo systemctl start confluent-ksql
    
    sudo systemctl start confluent-control-center
    

    You can check service status with this command:

    systemctl status confluent*
    


    The services will be running on the following ports:

    Component Port
    Kafka brokers (plain text) 9092
    Confluent Control Center 9021
    Kafka Connect REST API 8083
    KSQL Server REST API 8088
    REST Proxy 8082
    Schema Registry REST API 8081
    ZooKeeper 2181


    To access your Control Center, make sure to open port 9021 in AWS.
    Go to your instance Description tab and click on security group created. Include a new Inboud Rule for Port 9021. You can limit to your IP or leave accessible to all.

    Navigate to your <Public DNS (IPv4)>:9021 and you should see your Control Center.


    Congratulations!! :trophy: