文章详情|kafka核心知识点

kafka核心知识点 所属分类 kafka 浏览量 1730
核心概念
Broker         一个kafka实例 用于存储消息 一个kafka集群可以启动多个broker
Topic          productor将消息写入topic，consumer从topic消费消息
Partiton       一个topic可以设置多个分区，相当于把一个数据集分成多份分别放到不同的分区中存储，一个topic可以有一个或者多个分区，分区内消息有序
Replication    副本 replica ，一个Partition可以设置一个或多个副本， 容错 高可用
Producer       消费生产者 
ConsumerGroup  消费组，一个ConsumerGroup可以包含一个或多个consumer ，同一个消费组内，一条消息只能被一个消费者消费
Consumer       消息消费者，consumer从 topic 拉取消息 
Zookeeper      集群协调管理，Kafka将元数据信息保存在zookeeper中，集群的动态扩展、Broker负载均衡、Partition leader选举等


存储机制

每个分区一个文件夹
topicName-partitionID  
分区序号从0开始

每个分区多个segment  
xxx.index  xxx.log 
xxx 为 message在partition中的起始偏移量

log.segment.bytes=1073741824


索引文件
0.10以后的版本中 有两个索引文件
xxx.index    
xxx.timeindex
稀疏索引
log.index.interval.bytes 设置索引跨度


index 索引内容
offset,position

时间戳索引文件
索引内容 
timestamp,offset


高可用

多分区多副本
partition的不同副本分不到不同的broker
一个分区的多个副本选举一个leader，由leader负责读写，其他副本作为follower从leader同步消息

default.replication.factor=N
replication-factor 3   (总共3份数据)


Controller选举
从集群中的broker选举出一个Broker作为Controller控制节点
负责整个集群的管理，如Broker管理、Topic管理、Partition Leader选举等
选举过程所有的Broker向Zookeeper发起创建临时znode的请求，成功创建znode的Broker胜出作为Controller，未被选中的Broker监听Controller的znode，等待下次选举


Partition Leader选举
Controller负责分区Leader选举
ISR列表保存在 在zookeeper中
Follower从Leader拉取数据
Leader跟踪保持同步的flower列表ISR（In Sync Replica），作为下次选主的候选列表
Follower心跳超时或者消息落后太多，将被移除出ISR
                   
                   

配置信息
server.properties

broker.id=0
port=9092
host.name=node01
#允许删除topic
delete.topic.enable=true

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# 数据存储路径 
log.dirs=/usr/local/kafka/kafka-logs

# topic默认分区数
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

#数据保存时间，默认7天，单位小时
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

#zookeeper地址，多个用逗号隔开
zookeeper.connect=node01:2181,node02:2181,node03:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000




 kafka知识点 
 kafka高可用机制简介 
 Kafka1.1.0 Broker配置 
 kafka高性能要点 
 kafka运维常用命令 
 kafka副本机制
Spring Cloud Sleuth Zipkin原理

《人性的弱点》53条经典总结

kafka基础面试题

java中的list

arthas 异常排查技巧

elasticsearch 优化点