文章详情|Clickhouse基本使用

Clickhouse基本使用 所属分类 clickhouse 浏览量 717
DDL 操作

只有 MergeTree 和 Distributed 表引擎支持 ALTER 

CREATE TABLE tb_test2
(
    `id` Int8,
    `name` String comment '用户名' default 'VIP', --设置列的默认值 
)
ENGINE = MergeTree()
ORDER BY id ;

--添加字段
alter table tb_test2 add column age UInt8 ;
alter table tb_test2 add column gender String after name ; 
-- 删除字段
alter table tb_test2 drop column age ;
-- 修改字段的数据类型 
alter  table  tb_test2 modify column  gender UInt8 default 0 ;
--修改 / 添加字段的注释 
alter table tb_test2 comment column name '用户名' ;

-- 修改表名 
rename table tb_test1 to t1 ;
-- 修改多张表名
rename table tb_test2 to t2 , t1 to tt1 ;
-- 移动表到另一数据库中 
rename table t2 to test1.t ;




分区
只有 MergeTree表引擎支持数据分区

create table test_partition(
id String, 
ctime DateTime
)engine=MergeTree() 
partition by toYYYYMM(ctime)
order by (id) ;

查看表分区

SELECT partition_id,name,table,partition,active 
FROM system.parts WHERE table = 'test_partition' AND active = 1 ;

添加/删除分区
删除分区后，分区中的所有的数据全部删除 

alter table test_partition drop partition '202105' ;

合并分区
optimize table test_partition final;

复制分区

支持将A表的分区数据复制到B表，用于快速数据写入、多表间数据同步和备份等

复制分区需要满足两个前提条件
两张表需要拥有相同的分区键
表结构完全相同


创建表
create table test_partition1 as test_partition ;

复制一张表的分区到另一张表中 
alter table test_partition1 replace partition '202106' from test_partition

重置分区数据

如果数据表某一列的数据有误，需要将其重置为初始值，如果设置了默认值那么就是默认值数据，如果没有设置默认值，系统会给出默认的初始值

注意：不能重置主键和分区字段

alter table test_partition1 clear column name in partition '202105';


卸载/装载分区
使用场景：分区数据的迁移和备份

卸载分区detach

分区被卸载后，物理数据并没有删除，而是被转移到了当前数据表目录的detached子目录下。
该目录脱离clickhouse的管理，不会主动清理这些文件

alter table test_partition detach partition '202105';

装载分区attach
将detached子目录的某个分区重新装载回去
alter table test_partition attach partition '202105';



三、视图

普通视图
不会存储任何数据 
create view test_view as select id,upper(name),role from tb_test;

物化视图
数据保存形式由它的表引擎决定
源表写入新数据，物化视图也会同步更新

populate修饰符决定物化视图的初始化策略， POPULATE 在创建视图的过程中，会将源表中已存在的数据一并导入，
如同执行了INTO SELECT
物化视图目前不支持同步删除，源表中删除了数据，物化视图的数据会保留
create materialized view mater_test_view engine=Log 
populate as select * from tb_test;



查询语法

with子句

支持CTE（Common Table Expression，公共表表达式） 

定义变量
with pow(2,2) as res select pow(res,2)

调用函数
WITH toDate(create_time) AS bday
SELECT user_id,score,bday FROM test_a;

定义子查询
with (
    select user_id,score from test_a limit 1
    ) as sub
select user_id,score,sub from test_a;


from
从那里读取数据 

表中查询数据
子查询中查询数据
表函数中查询数据 表函数参考

file 数据文件必须在指定的目录下   /var/lib/clickhouse/user_files
SELECT * FROM file('demo.csv', 'CSV', 'id Int8,name String , age UInt8')

numbers
select * from numbers(2,10) ;
select toDate('2021-01-01') + number as date from numbers(365);

直接从mysql服务中查询数据
SELECT * FROM mysql('localhost:3306', 'test', 't_sku', 'root', '123456');

hdfs
SELECT * FROM hdfs('hdfs://hdfs1:9000/test', 'TSV', 'column1 UInt32, column2 UInt32')



关联查询


JOIN 

连接精度：支持ALL(默认)、ANY和ASOF三种类型
可通过join_default_strictness配置参数修改默认的连接精度类型

all
左表内的一行数据，在右表中有多行数据与之连接匹配，则返回右表中全部连接的数据

any
左表内的一行数据，在右表中有多行数据与之连接匹配，则仅返回右表中第一行连接的数据

asof
模糊连接，连接键之后追加定义一个模糊连接的匹配条件asof_column



连接类型
外连接(left/right/full)
内连接(inner)
交叉连接（cross）

array join
与数组或嵌套类型的字段进行JOIN操作，从而将一行数组展开为多行
类似于hive中的explode函数

在一条select语句 ，只能有一个array join 
inner array join（默认）：排除掉空数组
left aray join 
同时对多个数组字段进行array join操作，按行合并 
drop table if exists test_arrayjoin;
CREATE TABLE test_arrayjoin
(
    id    String,
    hobby Array(String)
)ENGINE = Memory;

insert into test_arrayjoin values (1, ['eat','drink','sleep']),(2, ['study','read','sport']),(3, []);

select id
     ,hobby
     ,hb
from test_arrayjoin
array join hobby as hb;

同时对多个数组字段进行array join操作
select id
     ,hobby
     ,hb
     ,arrayMap(x ->concat(x,'ABC'),hobby) as hobbyCon
     ,hyc
     ,arrayEnumerate(hobby) as hobbyEnum
     ,hbe
from test_arrayjoin
array join hobby as hb
,hobbyCon as hyc
,hobbyEnum as hbe;


with模型

with cube
如果聚合键的个数为n，则组合的个数为2^n
with rollup
按照聚合键从右向左上卷数据，基于聚合函数生成分组小计和总计，如果聚合键的个数为n，组合的个数为n+1
with totals
基于聚合函数对所有数据进行总计

order by子句

MergeTree 指定order by，数据在各个分区内会按照其定义的规则排序，这是一种分区内的局部排序
order by子句指定全局排序
NULL值排，null值最后 和 null值优先 

null值最后
select arrayJoin([1,2,null,0/0,3]) as v1 order by v1 desc nulls last;

null值优先
select arrayJoin([1,2,null,0/0,3]) as v1 order by v1 desc nulls first;

limit by子句

运行于order by 之后 和 limit之前，能够按照指定分组，最多返回前n行数据，常用于top n的查询场景


limit子句 limit by子句 可同时使用

select database,table,max(bytes_on_disk) as bytes from system.parts
group by database, table
order by database,bytes as desc
limit 2 by database

limit by支持offset偏移量获取数据

select database,table,max(bytes_on_disk) as bytes from system.parts
group by database, table
order by database,bytes as desc
limit 3 offset 1 by database
clickhouse MergeTree 存储结构

clickhouse核心知识点

OLAP和OLTP的本质区别

花半开酒微醺凡事留余地

围棋知识点

围棋基本知识口诀