flink dataset 大数据集测试
所属分类 flink
浏览量 966
数据文件大小超出jvm堆内存时,是否能正常处理
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
String file = "file:/tmp/events5000w.txt";
DataSet> csvInput = env.readCsvFile(file)
.types(String.class,String.class,String.class,String.class, Double.class);
csvInput.groupBy(2)
.sortGroup(3, Order.ASCENDING)
.sortGroup(1, Order.DESCENDING)
.combineGroup(new MyGroupCombineFunction())
.writeAsCsv("file:///tmp/events_result",WriteMode.OVERWRITE);
// /tmp/events_result 为目录
// 注意 一定要调用 execute ,否则不会执行
env.execute();
System.out.println("EventsGroupAndSort2 done,"+LocalDateTime.now());
测试数据 由 EventDataGen 生成
https://gitee.com/dyyx/demos/blob/master/flinkdemo/src/main/java/dyyx/zb/EventDataGen.java
5000w 事件 数据 ,大概3个多G
3383358338 events5000w.txt
堆大小 设置为 1G
-Xmx1g -Xms1g
本机 8G 4核
经过一段时间的等待 成功出结果
处理时间稍微有点长 整个执行过程约15分钟
期间 看了下GG 情况 , 疯狂 fullgc ,以为挂了
EventsGroupAndSort2 start,2021-04-26T07:39:53.581
EventsGroupAndSort2 done,2021-04-26T07:54:48.158
输出4个文件
ls -l
870757716 1
870757790 2
870757675 3
871085157 4
jstat -gc 25677 3000
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
43520.0 43520.0 0.0 0.0 262144.0 262144.0 699392.0 699010.5 41344.0 38861.5 5760.0 5386.6 5 0.563 3535 171.979 172.543
43520.0 43520.0 0.0 0.0 262144.0 156005.0 699392.0 699010.5 41344.0 38861.5 5760.0 5386.6 5 0.563 3547 172.680 173.244
43520.0 43520.0 0.0 0.0 262144.0 33604.6 699392.0 699010.5 41344.0 38861.5 5760.0 5386.6 5 0.563 3562 173.392 173.955
43520.0 43520.0 0.0 0.0 262144.0 94774.2 699392.0 699010.5 41344.0 38861.5 5760.0 5386.6 5 0.563 3576 174.037 174.600
43520.0 43520.0 0.0 0.0 262144.0 262144.0 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3591 174.696 175.260
43520.0 43520.0 0.0 0.0 262144.0 14358.2 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3606 175.403 175.966
43520.0 43520.0 0.0 0.0 262144.0 262144.0 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3621 176.058 176.622
43520.0 43520.0 0.0 0.0 262144.0 209689.3 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3637 176.815 177.378
43520.0 43520.0 0.0 0.0 262144.0 262144.0 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3652 177.528 178.091
43520.0 43520.0 0.0 0.0 262144.0 5164.5 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3666 178.306 178.869
43520.0 43520.0 0.0 0.0 262144.0 169872.0 699392.0 699013.2 41344.0 38863.6 5760.0 5386.6 5 0.563 3682 179.027 179.591
flink 测试数据生成
flink dataset 输出注意点
java8 jvm 参数
jvm参数查看
上一篇
下一篇
flink dataset 输出注意点
搞金融必备的各类计算公式
flink 测试数据生成
flink dataset groupBy sortBy 实例与说明
flink术语
flink datastream batch mode wordcount实例