kafka核心知识点
所属分类 kafka
浏览量 1336
核心概念
Broker 一个kafka实例 用于存储消息 一个kafka集群可以启动多个broker
Topic productor将消息写入topic,consumer从topic消费消息
Partiton 一个topic可以设置多个分区,相当于把一个数据集分成多份分别放到不同的分区中存储,一个topic可以有一个或者多个分区,分区内消息有序
Replication 副本 replica ,一个Partition可以设置一个或多个副本, 容错 高可用
Producer 消费生产者
ConsumerGroup 消费组,一个ConsumerGroup可以包含一个或多个consumer ,同一个消费组内,一条消息只能被一个消费者消费
Consumer 消息消费者,consumer从 topic 拉取消息
Zookeeper 集群协调管理,Kafka将元数据信息保存在zookeeper中,集群的动态扩展、Broker负载均衡、Partition leader选举等
存储机制
每个分区一个文件夹
topicName-partitionID
分区序号从0开始
每个分区多个segment
xxx.index xxx.log
xxx 为 message在partition中的起始偏移量
log.segment.bytes=1073741824
索引文件
0.10以后的版本中 有两个索引文件
xxx.index
xxx.timeindex
稀疏索引
log.index.interval.bytes 设置索引跨度
index 索引内容
offset,position
时间戳索引文件
索引内容
timestamp,offset
高可用
多分区多副本
partition的不同副本分不到不同的broker
一个分区的多个副本选举一个leader,由leader负责读写,其他副本作为follower从leader同步消息
default.replication.factor=N
replication-factor 3 (总共3份数据)
Controller选举
从集群中的broker选举出一个Broker作为Controller控制节点
负责整个集群的管理,如Broker管理、Topic管理、Partition Leader选举等
选举过程所有的Broker向Zookeeper发起创建临时znode的请求,成功创建znode的Broker胜出作为Controller,未被选中的Broker监听Controller的znode,等待下次选举
Partition Leader选举
Controller负责分区Leader选举
ISR列表保存在 在zookeeper中
Follower从Leader拉取数据
Leader跟踪保持同步的flower列表ISR(In Sync Replica),作为下次选主的候选列表
Follower心跳超时或者消息落后太多,将被移除出ISR
配置信息
server.properties
broker.id=0
port=9092
host.name=node01
#允许删除topic
delete.topic.enable=true
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# 数据存储路径
log.dirs=/usr/local/kafka/kafka-logs
# topic默认分区数
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
#数据保存时间,默认7天,单位小时
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
#zookeeper地址,多个用逗号隔开
zookeeper.connect=node01:2181,node02:2181,node03:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
kafka知识点
kafka高可用机制简介
Kafka1.1.0 Broker配置
kafka高性能要点
kafka运维常用命令
kafka副本机制
上一篇
下一篇
Spring Cloud Sleuth Zipkin原理
《人性的弱点》53条经典总结
kafka基础面试题
java中的list
arthas 异常排查技巧
elasticsearch 优化点