当前位置:首页 > 服务端 > kafka入门介绍

kafka入门介绍

2022年09月17日 10:47:52服务端4

kafka 是一个分布式的、分区的、复制的提交日志服务。
这意味着什么呢?
首先让我们回顾下几个基本的消息术语
- kafka 维护的类别称为主题的消息源。
- 我们调用生产者处理发布消息到kafka 主题。
- 我们调用消费者处理订阅的主题,处理发布的主题的消息源。
- kafka 运行在一个或者多个称为broker的服务的集群。

因此,在一个较高的水平,当生产者通过网络发送消息到kafka集群后,相关订阅消息的消费者会处理消息。
如图所示:
kafka入门介绍 _ JavaClub全栈架构师技术笔记

客户端和服务器之间的通信是通过一个简单的、高性能、语言无关的TCP协议。我们提供一个kafka的java客户端,但也支持各种其他语言客户端的支持。


话题 & 日志 (Topics and Logs)

让我们首先深入kafka的高级抽象提供了主题。
发布的消息是通过一个主题,这个主题是一个类别或订阅名称。

对于每一个话题,kafka集群维护一个分区日志,如下所示:
kafka入门介绍 _ JavaClub全栈架构师技术笔记

每个分区是一个命令,不变的序列的消息,可不断追加对客户提交日志。

分区中的消息都分配一个连续的id号叫做offset,惟一地标识每个消息内的分区

kafka集群保留所有发布的消息,不管他们是否已经消费了,在一个可配置的一段时间。

例如,如果日志保留设置为两天,然后两天后发布消息可以消费,之后,它将被丢弃,节约空间。

kafka的性能是有效常 数对数据大小,所以保留大量的数据并不是一个问题。

事实上,唯一的元数据在每个消费者的基础上,是保留的位置消费日志中,称为“offset”。

offset 是由消费者控制:正常一个消费者将会推进offset作为线性读取消息,但事实上,这个位置信息由消费者控制的,并能按照任何顺序消息消息的。
例如,一个消费者可以重置到之前的offset来重新处理消息。

这种组合的特性意味着kafka消费者非常轻量&方便,不管怎么来消费消息对集群或其他消费者的影响。
例如,你可以使用命令行终端工具去 tail 任何topic的内容,而不改变任何消费者消费的topic。

日志中的分区为多个目的。
首先,它们允许日志规模超出一个大小适合在一台服务器上。每个分区必须适合在服务器主机上,但一个主题可能有多个分区,这样它可以处理任意数量的数据。
其次,他们充当并行化的单元。

分布式(Distribution)

在服务器日志的分区分布kafka集群中的每个服务器处理数据和请求的分区。
每个分区是可复制的,在一个可配置的容错服务器数量。
每个分区有一个服务器充当”leader”,大于0个服务器充当”followers”。
“leader”处理分区所有的读和写请求,然而 “followers”积极的复制”leader”的数据。如果 “leader”失败了,”followers”中某一个follower将会充当新的”leader”角色。每个服务器充当一个”leader”的分区和”follower”的分区,因此集群中的负载均衡很好。

生产者(Producers)

生产者发布数据到他们选择的话题。生产者负责选择哪个消息分配给哪个分区内的话题。
循环的方式可以简单地来平衡负载或可以根据一些语义配分函数(比如基于一些关键的消息)。更多的使用分区。

消费者(Consumers)

消息通常有两个模型:队列和发布-订阅。
在一个队列,消费者可能会从服务器读取和每个消息去其中一个;
在发布-订阅消息被广播给所有的消费者。
kafka提供单个消费者的抽象,概括了这些消费者组。

消费者与消费者组名称标记本身,每个消息发布到主题是下发给一个消费者实例,在每个订阅的消费者组。
消费者实例可以在单独进程或者在单独机器。

如果所有的消费者实例有相同的消费组,那么这个消息模式就像一个传统的队列在消费者均衡负载。
如果所有的消费者实例有不同的消费组,那么这个消息模式就像发布-订阅和所有消息被广播给所有的消费者。

更常见的,然而,我们发现,对每个“逻辑订阅者,主题有一个小数量的消费者组。每组由许多消费者对可扩展性和容错性的实例。这只不过是发布-订阅语义订阅者是集群的消费者,而不是单个的进程。

kafka比其他传统的消息系统有更强的顺序保证。

传统的队列在服务器上保留消息顺序,如果多个消费者从同一个队列消费,那么服务器分发消息是按照存储顺序。虽然服务器分发消息是有序的,消息被异步的下发到消费者端,所以就会出现到达不同的消费者端是无序的。

待续未完成~


This effectively means the ordering of the messages is lost in the presence of parallel consumption.
Messaging systems often work around this by having a notion of “exclusive consumer” that allows only one process to consume from a queue, but of course this means that there is no parallelism in processing.

Kafka does it better. By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances than partitions.

Kafka only provides a total order over messages within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over messages this can be achieved with a topic that has only one partition, though this will mean only one consumer process.

Guarantees

At a high-level Kafka gives the following guarantees:
Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a message M1 is sent by the same producer as a message M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
A consumer instance sees messages in the order they are stored in the log.
For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log.
More details on these guarantees are given in the design section of the documentation.
1.2 Use Cases

Here is a description of a few of the popular use cases for Apache Kafka. For an overview of a number of these areas in action, see this blog post.
Messaging

Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications.
In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strong durability guarantees Kafka provides.

In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ or RabbitMQ.

Website Activity Tracking

The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.
Activity tracking is often very high volume as many activity messages are generated for each user page view.

Metrics

Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.
Log Aggregation

Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency.
Stream Processing

Many users end up doing stage-wise processing of data where data is consumed from topics of raw data and then aggregated, enriched, or otherwise transformed into new Kafka topics for further consumption. For example a processing flow for article recommendation might crawl article content from RSS feeds and publish it to an “articles” topic; further processing might help normalize or deduplicate this content to a topic of cleaned article content; a final stage might attempt to match this content to users. This creates a graph of real-time data flow out of the individual topics. Storm and Samza are popular frameworks for implementing these kinds of transformations.
Event Sourcing

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka’s support for very large stored log data makes it an excellent backend for an application built in this style.
Commit Log

Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The log compaction feature in Kafka helps support this usage. In this usage Kafka is similar to Apache BookKeeper project.

作者:stark_summer
来源链接:https://blog.csdn.net/stark_summer/article/details/48831149

版权声明:
1、JavaClub(https://www.javaclub.cn)以学习交流为目的,由作者投稿、网友推荐和小编整理收藏优秀的IT技术及相关内容,包括但不限于文字、图片、音频、视频、软件、程序等,其均来自互联网,本站不享有版权,版权归原作者所有。

2、本站提供的内容仅用于个人学习、研究或欣赏,以及其他非商业性或非盈利性用途,但同时应遵守著作权法及其他相关法律的规定,不得侵犯相关权利人及本网站的合法权利。
3、本网站内容原作者如不愿意在本网站刊登内容,请及时通知本站(javaclubcn@163.com),我们将第一时间核实后及时予以删除。


本文链接:https://www.javaclub.cn/server/42512.html

标签: Kafka
分享给朋友:

“kafka入门介绍” 的相关文章

kafka消息中间件-快速学习

为什么需要消息队列   周末无聊刷着手机,某宝网APP突然蹦出来一条消息“为了回馈老客户,女朋友买一送一,活动仅限今天!”。买一送一还有这种好事,那我可不能错过!忍不住立马点了去。于是选了两个最新款,下单、支付一气呵成!满足的躺在床上,想着马上有女朋友了,竟然幸福的失眠了…...

kafka消息长度限制

更改为10M 客户端代码增加:max_request_size=10485760, 服务端配置:replica.fetch.max.bytes=10485760,message.max.bytes=10485760...

【kafka】安装部署kafka集群(kafka版本:kafka_2.12-2.3.0)

3.2.1 下载kafka并安装kafka_2.12-2.3.0.tgz tar -zxvf kafka_2.12-2.3.0.tgz 3.2.2 配置kafka集群 在config/server.properties中修改参数: [had...

kafka-server-stop.sh关闭Kafka失败

Kafka brokers need to finish the shutdown process before the zookeepers do. So start the zookeepers, then the kafka brokers wil...

在CentOS 7上安装Kafka

简介 Kafka 是一种高吞吐的分布式发布订阅消息系统,能够替代传统的消息队列用于解耦合数据处理,缓存未处理消息等,同时具有更高的吞吐率,支持分区、多副本、冗余,因此被广泛用于大规模消息数据处理应用。Kafka 支持Java 及多种其它语言客户端,可与Hadoop、Storm、S...

Kafka 安装和简单使用

Kafka 安装和简单使用

文章目录 Kafka 安装和简单使用 kafka下载地址 windows 系统...

kafka的基本概念和工作流程分析

kafka的基本概念和工作流程分析

为什么需要消息队列   周末无聊刷着手机,某宝网APP突然蹦出来一条消息“为了回馈老客户,女朋友买一送一,活动仅限今天!”。买一送一还有这种好事,那我可不能错过!忍不住立马点了去。于是选了两个最新款,下单、支付一气呵成!满足的躺在床上,想着马上有女朋友了,竟然幸福的失眠了…...

Linux下Kafka下载与安装教程

Linux下Kafka下载与安装教程

原文链接:http://www.studyshare.cn/software/details/1176/0 一、预备环境 Kafka是java生态圈中的一员,运行在java虚拟机上,按Kafka官方说明,java环境推荐Java8;Kafka需要Zookeeper保存集群的...

Linux安装新版Kafka3.0

Linux安装新版Kafka3.0

最近开始玩Kafka了,想着装一下新版本的玩玩,然后网上找Kafka3.0的安装教程,发现安装Kafka3.0的倒是有,但是zookeeper还是单独安装的,这就不满足我的需求了,我就是单纯的想玩玩Kafka,我还得再去另外安装zookeepe...

kafka集群原理介绍

kafka集群原理介绍 @(博客文章)[kafka|大数据] 目录 kafka集群原理介绍 (一)基础理论 二、配置文件 三、错误处理 本系统文章共三篇,分别为 1、ka...

发表评论

访客

◎欢迎参与讨论,请在这里发表您的看法和观点。