1. Kafka란?

Message Queue(MQ, 메시지큐)

메시지 큐란? 프로세스 또는 프로그램의 인스턴스 간에 데이터를 서로 교환할때 사용하는 방식 중 하나이며, 미들웨어의에 속하며 메모리 기반(Queue) 형식으로 처리하기 때문에 처리속도가 빠르다.

하지만 카프카는 메모리 기반이 아닌 파일시스템 기반이며, 이를 통해 **영속성(Persistence)**을 얻는다. 영속성을 통해서 과거 소비했던 메시지를 미래에 메시지 발행 없이 다시 소비할 수 있다.

2. 구현방식(Pub-Sub)

Kafka는 Publish(발행)-Subscribe(구독) 모델을 이용하여 구현된 분산형 MQ

3. 동작방식

우체국에서 우편을 보내는 것과 매우 유사

Publisher(발행자)가 받는 이의 이름(Topic)과 전달 내용(Message)을 적어서 우체국(Kafka-broker)에 전달한다. 그러면 우체국에서 Topic별로 분산하여 저장한다(Queue in Memory).

Subscriber(구독자)는 Message를 읽을 준비가 되면 우체국에게 나에게(Topic) 온 메시지를 달라고 요청을 한다. 그러면 Kafka의 Broker가 해당 Topic으로 발행된 메시지가 존재하면 해당 메시지를 전달한다.

만약 불행하게도 발행된 Message가 사라진다면 Publisher은 귀찮게 과거에 작성한 Message를 다시 작성해야하는 일이 발생하게 된다. Message의 안전?을 위해서 Kafka는 발행된 Message의 복사본(Replica)을 만들어 여러 우체국에 분산 저장하고 있다.

4. Reactive Kafka 구현

Reactor kafka에는 두가지 핵심 인터페이스가 있다.

  1. kafka에 Message를 발행(publish)하는 reactor.kafka.sender.KafkaSener

  2. 카프카의 Message를 소비(consume)하는 reactor.kafka.reciver.KafkaReceiver

  3. Publish(KafkaSener)

  4. Subscribe(KafkaReceiver)

    Apache Kafka: Architecture, Real-Time CDC, and Python Integration

    https://miro.medium.com/v2/resize:fit:700/1*QPv4owWnCo09Q39_rVBBNA.png

    Apache Kafka is a distributed streaming platform that has gained significant popularity for its ability to handle high-throughput, fault-tolerant messaging among applications and systems. At its core, Kafka is designed to provide durable storage and stream processing capabilities, making it an ideal choice for building real-time streaming data pipelines and applications. This article will delve into the architecture of Kafka, its key components, and how to interact with Kafka using Python.

    ∘ Kafka Architecture Overview

    ∘ Topics and Partitions

    ∘ Producers and Consumers

    ∘ Brokers

    ∘ ZooKeeper

    ∘ Kafka’s Distributed Nature

    ∘ Kafka Streams and Connect

    ∘ Handling Changes and State

    ∘ Understanding Offsets

    ∘ Practical Implications

    ∘ Interacting with Kafka using Python

    ∘ Creating a Topic

    ∘ Listing Topics

    ∘ Deleting a Topic

    ∘ Producing Messages to Kafka

    ∘ Consuming Messages from Kafka

    · Using kafka-python

    ∘ Installation

    ∘ Producer Example

    ∘ Consumer Example

    ∘ Key Points

    ∘ Order Maintenance in Kafka

    ∘ Dead Letter Queues

    ∘ Purpose of Dead Letter Queues

    ∘ How Dead Letter Queues Work

    ∘ Use in Various Technologies

    ∘ Best Practices

    ∘ Caching Strategies with Kafka

    ∘ 1.Client-Side Caching

    ∘ 2.Kafka Streams State Stores

    ∘ 3.Interactive Queries in Kafka Streams

    ∘ 4.External Caching Systems

    ∘ Considerations for Caching with Kafka

    · Example: Using Kafka Streams State Store for Caching

    · Prerequisites

    · Example Overview

    · Redis as a State Store

    · Consuming Messages and Caching Results

    · Considerations

    ∘ Partitioning in Apache Kafka

    ∘ Partitioning in Databases

    ∘ Partitioning in Distributed File Systems

    ∘ Challenges and Considerations

    ∘ How Kafka Supports CDC

    ∘ Implementing CDC with Kafka

    ∘ CDC Workflow with Kafka

    ∘ Considerations

    ∘ CDC Data from Kafka with Python

    ∘ Pre-requisites

    ∘ Consuming CDC Data from Kafka with Python

    ∘ Using confluent-kafka-python

    ∘ Using kafka-python

    ∘ Notes

    Kafka Architecture Overview

    https://miro.medium.com/v2/0*z5e9GMyQ9R-RFbJ2

    The architecture of Apache Kafka is built around a few core concepts: producers, consumers, brokers, topics, partitions, and the ZooKeeper coordination system. Understanding these components is crucial to leveraging Kafka effectively.

    Topics and Partitions

    Producers and Consumers

    Brokers

    ZooKeeper

    Kafka’s Distributed Nature

    Kafka’s architecture is inherently distributed. This design allows Kafka to be highly available and scalable. You can add more brokers to a Kafka cluster to increase its capacity and fault tolerance. Data is replicated across multiple brokers to prevent data loss in the case of a broker failure.

    https://miro.medium.com/v2/0*Y60EovrnihBBsZ7B

    Kafka Streams and Connect

    Handling Changes and State

    Understanding Offsets

    https://miro.medium.com/v2/0*xUDOJWtIpo-Ax1Wh

    Here are some key points about offsets in Kafka:

    Practical Implications

    In summary, offsets are a fundamental concept in Kafka that enables efficient, ordered, and reliable message processing in distributed systems. They facilitate Kafka’s high-throughput capabilities while supporting consumer scalability and fault-tolerant design.

    https://miro.medium.com/v2/resize:fit:700/1*_FXxr5Ua1IzWeuwvHyXzLw.png

    Interacting with Kafka using Python

    Python developers can interact with Kafka through the confluent-kafka-python library or the kafka-python library. Both provide comprehensive tools to produce and consume messages from a Kafka cluster.

    Installation