首页  

kafka核心知识点     所属分类 kafka 浏览量 1336
核心概念
Broker         一个kafka实例 用于存储消息 一个kafka集群可以启动多个broker
Topic          productor将消息写入topic,consumer从topic消费消息
Partiton       一个topic可以设置多个分区,相当于把一个数据集分成多份分别放到不同的分区中存储,一个topic可以有一个或者多个分区,分区内消息有序
Replication    副本 replica ,一个Partition可以设置一个或多个副本, 容错 高可用
Producer       消费生产者 
ConsumerGroup  消费组,一个ConsumerGroup可以包含一个或多个consumer ,同一个消费组内,一条消息只能被一个消费者消费
Consumer       消息消费者,consumer从 topic 拉取消息 
Zookeeper      集群协调管理,Kafka将元数据信息保存在zookeeper中,集群的动态扩展、Broker负载均衡、Partition leader选举等


存储机制 每个分区一个文件夹 topicName-partitionID 分区序号从0开始 每个分区多个segment xxx.index xxx.log xxx 为 message在partition中的起始偏移量 log.segment.bytes=1073741824 索引文件 0.10以后的版本中 有两个索引文件 xxx.index xxx.timeindex 稀疏索引 log.index.interval.bytes 设置索引跨度 index 索引内容 offset,position 时间戳索引文件 索引内容 timestamp,offset
高可用 多分区多副本 partition的不同副本分不到不同的broker 一个分区的多个副本选举一个leader,由leader负责读写,其他副本作为follower从leader同步消息 default.replication.factor=N replication-factor 3 (总共3份数据) Controller选举 从集群中的broker选举出一个Broker作为Controller控制节点 负责整个集群的管理,如Broker管理、Topic管理、Partition Leader选举等 选举过程所有的Broker向Zookeeper发起创建临时znode的请求,成功创建znode的Broker胜出作为Controller,未被选中的Broker监听Controller的znode,等待下次选举 Partition Leader选举 Controller负责分区Leader选举 ISR列表保存在 在zookeeper中 Follower从Leader拉取数据 Leader跟踪保持同步的flower列表ISR(In Sync Replica),作为下次选主的候选列表 Follower心跳超时或者消息落后太多,将被移除出ISR
配置信息 server.properties broker.id=0 port=9092 host.name=node01 #允许删除topic delete.topic.enable=true # The number of threads handling network requests num.network.threads=3 # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 ############################# Log Basics ############################# # 数据存储路径 log.dirs=/usr/local/kafka/kafka-logs # topic默认分区数 num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Log Flush Policy ############################# # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: # 1. Durability: Unflushed data may be lost if you are not using replication. # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=10000 # The maximum amount of time a message can sit in a log before we force a flush #log.flush.interval.ms=1000 ############################# Log Retention Policy ############################# # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. #数据保存时间,默认7天,单位小时 log.retention.hours=168 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. Functions independently of log.retention.hours. #log.retention.bytes=1073741824 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=300000 ############################# Zookeeper ############################# #zookeeper地址,多个用逗号隔开 zookeeper.connect=node01:2181,node02:2181,node03:2181 # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000
kafka知识点 kafka高可用机制简介 Kafka1.1.0 Broker配置 kafka高性能要点 kafka运维常用命令 kafka副本机制

上一篇     下一篇
Spring Cloud Sleuth Zipkin原理

《人性的弱点》53条经典总结

kafka基础面试题

java中的list

arthas 异常排查技巧

elasticsearch 优化点