首页  

flink dataset 大数据集测试     所属分类 flink 浏览量 111
数据文件大小超出jvm堆内存时,是否能正常处理


final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        String file = "file:/tmp/events5000w.txt";
        DataSet<Tuple5<String,String,String,String, Double>> csvInput = env.readCsvFile(file)
               .types(String.class,String.class,String.class,String.class, Double.class);
            
        csvInput.groupBy(2)
        .sortGroup(3, Order.ASCENDING)
        .sortGroup(1, Order.DESCENDING)
        .combineGroup(new MyGroupCombineFunction())
        .writeAsCsv("file:///tmp/events_result",WriteMode.OVERWRITE);
        // /tmp/events_result 为目录
        // 注意 一定要调用  execute ,否则不会执行
        env.execute();
        
        System.out.println("EventsGroupAndSort2 done,"+LocalDateTime.now());


测试数据 由 EventDataGen 生成
https://gitee.com/dyyx/demos/blob/master/flinkdemo/src/main/java/dyyx/zb/EventDataGen.java

5000w 事件 数据  ,大概3个多G

3383358338  events5000w.txt

堆大小 设置为 1G
-Xmx1g  -Xms1g

本机 8G 4核
经过一段时间的等待  成功出结果

处理时间稍微有点长  整个执行过程约15分钟

期间 看了下GG 情况 , 疯狂 fullgc  ,以为挂了

EventsGroupAndSort2 start,2021-04-26T07:39:53.581
EventsGroupAndSort2 done,2021-04-26T07:54:48.158

输出4个文件
ls -l
870757716  1
870757790  2
870757675  3
871085157  4





jstat -gc 25677 3000
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT  
43520.0 43520.0  0.0    0.0   262144.0 262144.0  699392.0   699010.5  41344.0 38861.5 5760.0 5386.6      5    0.563 3535   171.979  172.543
43520.0 43520.0  0.0    0.0   262144.0 156005.0  699392.0   699010.5  41344.0 38861.5 5760.0 5386.6      5    0.563 3547   172.680  173.244
43520.0 43520.0  0.0    0.0   262144.0 33604.6   699392.0   699010.5  41344.0 38861.5 5760.0 5386.6      5    0.563 3562   173.392  173.955
43520.0 43520.0  0.0    0.0   262144.0 94774.2   699392.0   699010.5  41344.0 38861.5 5760.0 5386.6      5    0.563 3576   174.037  174.600
43520.0 43520.0  0.0    0.0   262144.0 262144.0  699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3591   174.696  175.260
43520.0 43520.0  0.0    0.0   262144.0 14358.2   699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3606   175.403  175.966
43520.0 43520.0  0.0    0.0   262144.0 262144.0  699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3621   176.058  176.622
43520.0 43520.0  0.0    0.0   262144.0 209689.3  699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3637   176.815  177.378
43520.0 43520.0  0.0    0.0   262144.0 262144.0  699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3652   177.528  178.091
43520.0 43520.0  0.0    0.0   262144.0  5164.5   699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3666   178.306  178.869
43520.0 43520.0  0.0    0.0   262144.0 169872.0  699392.0   699013.2  41344.0 38863.6 5760.0 5386.6      5    0.563 3682   179.027  179.591



flink 测试数据生成 flink dataset 输出注意点 java8 jvm 参数 jvm参数查看

上一篇     下一篇
flink dataset 输出注意点

搞金融必备的各类计算公式

flink 测试数据生成

flink dataset groupBy sortBy 实例与说明

flink术语

flink datastream batch mode wordcount实例