Data Lake

Posts

Kafka Messages and Data Consistency

Kafka Message and Data Consistency With its scalability and fault tolerance features, Kafak has been becoming more an more popular in large scale, real time enterprise applications. Kafka messages are published to partitions that are usually located on different nodes and consumed by multiple consumers, each of which read messages from a single partition. This raises a data consistency issue due to multiple partitions and consumers. For example, if a security in a trading system is modified twice within a very short time and the messages could be published to two different partitions. As a result, the two messages are processed by two consumers and there is no guarantee that the last message ends up in your application or your data storage. How can this issue be resolved? Kafka Key With Single Threaded Consumer Kafka message is published with a key and payload. The messages with the same key are published to the same partition that will be consumed by the same c...

Kafka Connect, Kafka and Data Pipeline

Kafka Connect, Kafka, Avro, Nifi and Data Pipeline With Kafka Connect, Avro, Kafka and Nifi working together, we can build a close to real time data pipeline for system integration. Here is how it works. Kafk Connect captures the data changes in a traditional RDBMS like MySQL or SQLServer using a JDBC connector in real time. The connect serializes the captured data using Avro and publishes it to a Kafka topic Nifi consumes the data using built-in processor ConsumeKafka and with Avro (for deserialization) Nifi transforms, enriches, and persists the data to MongoDB or Other big data storage The following is a diagram showing how the data pipeline works.

Apache Nifi and System Integration

Apache Nifi and System Integration Introduction Apache Nifi is a distributed data platform based on enterprise integration pattern(EIP). It is a very powerful tool to build data pipeline with its large number of built-in processors.In today's service orientated architecture or a system composed of micro services, the flow of data among systems is fundamental in building enterprise applications. Among integration tools (Mule ESB, Apache Camel, Apache Nifi), Nifi is my favorite due to its built-in processors, ease to use and dynamic/hot redeployment, all leading to high productivity. When it is used together with Kafka connector, we can build real time CDC(change data capture) bi-direction integration system. This feature is especially useful in application re-engineering and migration. Nifi Basics Unlike camel, Nifi is a web based integration tool where you configure your processors, the building block of Nifi, to pilepine your data from your source system to your target ...

Data Lake

Search This Blog

Posts

Kafka Consumer: Why Group ID

Kafka Messages and Data Consistency

Kafka Connect, Kafka and Data Pipeline

Apache Nifi and System Integration