首页  

flink术语     所属分类 flink 浏览量 747
https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/glossary.html

JobManager   TaskManager 


The JobManager is the orchestrator of a Flink Cluster. 
It contains three distinct components: 
Flink Resource Manager, Flink Dispatcher and one Flink JobMaster per running Flink Job.

Flink JobMaster
JobMasters are one of the components running in the JobManager. 
A JobMaster is responsible for supervising the execution of the Tasks of a single job.


Flink TaskManager
TaskManagers are the worker processes of a Flink Cluster. 
Tasks are scheduled to TaskManagers for execution. 
They communicate with each other to exchange data between subsequent Tasks.


Event  
Events are special types of records

Record
Records are the constituent elements of a data set or data stream. 
Operators and Functions receive records as input and emit records as output.



ExecutionGraph / Physical Graph
A physical graph is the result of translating a Logical Graph for execution in a distributed runtime. 
The nodes are Tasks and the edges indicate input/output-relationships or partitions of data streams or data sets.


JobGraph / Logical Graph
A logical graph is a directed graph where the nodes are Operators 
and the edges define input/output-relationships of the operators 
and correspond to data streams or data sets. 
A logical graph is created by submitting jobs from a Flink Application.
Logical graphs are also often referred to as dataflow graphs.


Operator
Node of a Logical Graph. 
An Operator performs a certain operation, which is usually executed by a Function. 
Sources and Sinks are special Operators for data ingestion and data egress.
算子执行特定的操作,通常由函数执行

Function
Most Functions are wrapped by a corresponding Operator.


Operator Chain
An Operator Chain consists of two or more consecutive Operators without any repartitioning in between. 
Operators within the same Operator Chain forward records to each other directly without going through serialization or Flink’s network stack.

算子链由两个或多个连续的算子组成,其间没有任何重新划分
同一个算子链中的操作不经过序列化或网络堆栈,直接转发记录



Task
Node of a Physical Graph. A task is the basic unit of work, which is executed by Flink’s runtime. 
Tasks encapsulate exactly one parallel instance of an Operator or Operator Chain.

Sub-Task
A Sub-Task is a Task responsible for processing a partition of the data stream. 
The term “Sub-Task” emphasizes that there are multiple parallel Tasks for the same Operator or Operator Chain.


Partition
A partition is an independent subset of the overall data stream or data set. 
A data stream or data set is divided into partitions by assigning each record to one or more partitions. 
Partitions of data streams or data sets are consumed by Tasks during runtime. 
A transformation which changes the way a data stream or data set is partitioned is often called repartitioning.

 
Instance
The term instance is used to describe a specific instance of a specific type (usually Operator or Function) during runtime. 
In the context of Apache Flink, the term parallel instance is also frequently used to emphasize that multiple instances of the same Operator or Function type are running in parallel.
 

 
Flink Application
A Flink application is a Java Application that submits one or multiple Flink Jobs from the main() method (or by some other means). 
Submitting jobs is usually done by calling execute() on an execution environment.

Flink Job
A Flink Job is the runtime representation of a logical graph (also often called dataflow graph) 
that is created and submitted by calling execute() in a Flink Application.


Managed State
Managed State describes application state which has been registered with the framework. 
For Managed State, Apache Flink will take care about persistence and rescaling among other things.

State Backend
For stream processing programs, 
the State Backend of a Flink Job determines how its state is stored on each TaskManager 
(Java Heap of TaskManager or (embedded) RocksDB) 
as well as where it is written upon a checkpoint (Java Heap of JobManager or Filesystem).

(Runtime) Execution Mode
DataStream API programs can be executed in one of two execution modes: BATCH or STREAMING.

上一篇     下一篇
flink 测试数据生成

flink dataset 大数据集测试

flink dataset groupBy sortBy 实例与说明

flink datastream batch mode wordcount实例

flink内存管理机制

为何kafka要去掉zookeeper依赖