StarRocks2.1 vs Clickhouse vs Apache Druid® SSB单表性能测试对比报告
本文发表于: &{ new Date(1644854400000).toLocaleDateString() }
1. 测试结论
Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集(来源论文),通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标。Clickhouse 通过改写SSB,将星型模型打平转化成宽表,改造成了一个单表测试benchmark(参考链接)。本报告记录了StarRocks、Apache Druid®和Clickhouse在SSB单表数据集上的性能对比结果,并记录了在用户经常碰到的低基数聚合场景下StarRocks和ClickHouse的性能对比结果。测试结论如下:
- 在标准测试数据集的13个查询上,ClickHouse的整体查询时间是StarRocks的1.7倍,Apache Druid® 的整体查询时间是StarRocks的2.2倍。
- 在StarRocks启用bitmap index和cache的情况下,性能更胜一筹,尤其在 Q2.2 Q2.3 Q3.3 上有显著提升。整体性能是ClickHouse的2.2倍,Apache Druid®的2.9倍。
- 在标准测试数据集上,我们选取了一些常见的低基数聚合场景。ClickHouse的整体查询时间是StarRocks的2.26倍。
在SSB单表和用户经常碰到的低基数聚合场景下对比了StarRocks和ClickHouse的性能指标。采用3x16core 64GB内存的云主机,在6亿行的数据规模进行测试。
2. 测试准备
2.1 硬件环境
机器 | 3台 阿里云主机 |
CPU | 16coreIntel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHzcache size : 36608 KB |
内存 | 64GB |
网络带宽 | 5Gbits/s |
磁盘 | ESSD |
2.2 软件环境
StarRocks, Apache Druid®和Clickhouse部署在相同配置的机器上分别进行启动测试。
- StarRocks部署3BE 1FE;
- Clickhouse部署三个节点后建立分布式表;
- Apache Druid® 多一台8core的master主机,部署了Broker/Coordinator/Overlord/Router(但是测试压力不在master,影响较小可以忽略),Historical/MiddleManager混合部署在与SR,CK同等配置的主机上。
内核版本:Linux 3.10.0-1127.13.1.el7.x86_64
操作系统版本:CentOS Linux release 7.8.2003
软件版本:StarRocks 2.1,Aapche Druid 0.20.1,ClickHouse 21.9
3. 测试数据与结果
3.1 测试数据
表名 | 行数 | 解释 |
lineorder | 6亿 | SSB商品订单表 |
customer | 300万 | SSB客户表 |
part | 140万 | SSB 零部件表 |
supplier | 20万 | SSB 供应商表 |
dates | 2556 | 日期表 |
lineorder_flat | 6亿 | SSB打平后的宽表 |
3.2 测试SQL
单表测试SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
单表低基数测试SQL
--Q1
select count(*),lo_shipmode from lineorder_flat group by lo_shipmode;
--Q2
select count(distinct lo_shipmode) from lineorder_flat;
--Q3
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority;
--Q4
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority,lo_shippriority;
--Q5
select count(*),lo_shipmode,s_city from lineorder_flat group by lo_shipmode,s_city;
--Q6
select count(*) from lineorder_flat group by c_city,s_city;
--Q7
select count(*) from lineorder_flat group by lo_shipmode,lo_orderdate;
--Q8
select count(*) from lineorder_flat group by lo_orderdate,s_nation,s_region;
--Q9
select count(*) from lineorder_flat group by c_city,s_city,c_nation,s_nation;
--Q10
select count(*) from (select count(*) from lineorder_flat group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation) t;
--Q11
select count(*) from (select count(*) from lineorder_flat_distributed group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation,p_mfgr) t;
--Q12
select count(*) from (select count(*) from lineorder_flat group by substr(lo_shipmode,2),lower(lo_orderpriority),p_category,s_nation,c_nation,s_region,p_mfgr) t;
3.3 测试结果
SSB单表测试结果
低基数聚合测试结果
4. 测试步骤
ClickHouse的建表导入参考官方文档,StarRocks的数据生成导入流程如下:
4.1 数据生成
首先下载ssb-poc工具包并编译
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.10.0.zip
unzip ssb-poc-0.10.0.zip
cd ssb-poc-0.10.0
cd ssb-poc
make && make install
所有相关工具安装到output目录。
进入output目录,生成数据
cd output
bin/gen-ssb.sh 100 data_dir
4.2 创建表结构
修改配置文件conf/starrocks.conf,指定脚本操作的集群地址
# for mysql cmd
mysql_host: 192.168.1.1
mysql_port: 9030
mysql_user: root
mysql_password:
database: ssb
# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000
# parallel_fragment_exec_instance_num 设置并行度,建议是每个集群节点逻辑核数的一半,以下以8为例
parallel_num: 8
...
执行脚本建表
# 测试100G数据
bin/create_db_table.sh ddl_100
以下为"lineorder_flat"表建表语句。在上一步脚本中已经创建"lineorder_flat"表,并进行了默认分桶数配置。您可以删除该表,然后根据集群规模节点配置重新规划分桶数再进行创建,可实现更好测试效果。
CREATE TABLE `lineorder_flat` (
`lo_orderdate` date NOT NULL COMMENT "",
`lo_orderkey` int(11) NOT NULL COMMENT "",
`lo_linenumber` tinyint(4) NOT NULL COMMENT "",
`lo_custkey` int(11) NOT NULL COMMENT "",
`lo_partkey` int(11) NOT NULL COMMENT "",
`lo_suppkey` int(11) NOT NULL COMMENT "",
`lo_orderpriority` varchar(100) NOT NULL COMMENT "",
`lo_shippriority` tinyint(4) NOT NULL COMMENT "",
`lo_quantity` tinyint(4) NOT NULL COMMENT "",
`lo_extendedprice` int(11) NOT NULL COMMENT "",
`lo_ordtotalprice` int(11) NOT NULL COMMENT "",
`lo_discount` tinyint(4) NOT NULL COMMENT "",
`lo_revenue` int(11) NOT NULL COMMENT "",
`lo_supplycost` int(11) NOT NULL COMMENT "",
`lo_tax` tinyint(4) NOT NULL COMMENT "",
`lo_commitdate` date NOT NULL COMMENT "",
`lo_shipmode` varchar(100) NOT NULL COMMENT "",
`c_name` varchar(100) NOT NULL COMMENT "",
`c_address` varchar(100) NOT NULL COMMENT "",
`c_city` varchar(100) NOT NULL COMMENT "",
`c_nation` varchar(100) NOT NULL COMMENT "",
`c_region` varchar(100) NOT NULL COMMENT "",
`c_phone` varchar(100) NOT NULL COMMENT "",
`c_mktsegment` varchar(100) NOT NULL COMMENT "",
`s_region` varchar(100) NOT NULL COMMENT "",
`s_nation` varchar(100) NOT NULL COMMENT "",
`s_city` varchar(100) NOT NULL COMMENT "",
`s_name` varchar(100) NOT NULL COMMENT "",
`s_address` varchar(100) NOT NULL COMMENT "",
`s_phone` varchar(100) NOT NULL COMMENT "",
`p_name` varchar(100) NOT NULL COMMENT "",
`p_mfgr` varchar(100) NOT NULL COMMENT "",
`p_category` varchar(100) NOT NULL COMMENT "",
`p_brand` varchar(100) NOT NULL COMMENT "",
`p_color` varchar(100) NOT NULL COMMENT "",
`p_type` varchar(100) NOT NULL COMMENT "",
`p_size` tinyint(4) NOT NULL COMMENT "",
`p_container` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderdate`, `lo_orderkey`)
COMMENT "OLAP"
PARTITION BY RANGE(`lo_orderdate`)
(START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR))
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
同时修改BE的page_cache参数,并重启BE:
disable_storage_page_cache=false; -- 开启page_cache
storage_page_cache_limit=4294967296; --设置page_cache的大小
如您希望测试创建bitmap_index情况下的性能,可以进行如下操作。如您希望直接测试标准性能,请跳过此步骤进行数据导入。
对所有字符串列创建bitmap_index:
#对 lo_orderpriority、lo_shipmode、c_name、c_address、c_city、c_nation、c_region、c_phone、c_mktsegment、s_region、s_nation、s_city、s_name、s_address、s_phone、p_name、p_mfgr、p_category、p_brand、p_color、p_type、p_container 创建bitmap_index
CREATE INDEX bitmap_lo_orderpriority ON lineorder_flat (lo_orderpriority) USING BITMAP;
CREATE INDEX bitmap_lo_shipmode ON lineorder_flat (lo_shipmode) USING BITMAP;
CREATE INDEX bitmap_c_name ON lineorder_flat (c_name) USING BITMAP;
CREATE INDEX bitmap_c_address ON lineorder_flat (c_address) USING BITMAP;
CREATE INDEX bitmap_c_city ON lineorder_flat (c_city) USING BITMAP;
CREATE INDEX bitmap_c_nation ON lineorder_flat (c_nation) USING BITMAP;
CREATE INDEX bitmap_c_region ON lineorder_flat (c_region) USING BITMAP;
CREATE INDEX bitmap_c_phone ON lineorder_flat (c_phone) USING BITMAP;
CREATE INDEX bitmap_c_mktsegment ON lineorder_flat (c_mktsegment) USING BITMAP;
CREATE INDEX bitmap_s_region ON lineorder_flat (s_region) USING BITMAP;
CREATE INDEX bitmap_s_nation ON lineorder_flat (s_nation) USING BITMAP;
CREATE INDEX bitmap_s_city ON lineorder_flat (s_city) USING BITMAP;
CREATE INDEX bitmap_s_name ON lineorder_flat (s_name) USING BITMAP;
CREATE INDEX bitmap_s_address ON lineorder_flat (s_address) USING BITMAP;
CREATE INDEX bitmap_s_phone ON lineorder_flat (s_phone) USING BITMAP;
CREATE INDEX bitmap_p_name ON lineorder_flat (p_name) USING BITMAP;
CREATE INDEX bitmap_p_mfgr ON lineorder_flat (p_mfgr) USING BITMAP;
CREATE INDEX bitmap_p_category ON lineorder_flat (p_category) USING BITMAP;
CREATE INDEX bitmap_p_brand ON lineorder_flat (p_brand) USING BITMAP;
CREATE INDEX bitmap_p_color ON lineorder_flat (p_color) USING BITMAP;
CREATE INDEX bitmap_p_type ON lineorder_flat (p_type) USING BITMAP;
CREATE INDEX bitmap_p_container ON lineorder_flat (p_container) USING BITMAP;
修改BE参数并重启BE:
bitmap_max_filter_ratio=1000;
4.3 数据导入
使用Stream load导入单表数据
bin/stream_load.sh data_dir
插入数据到宽表lineorder_flat
bin/flat_insert.sh
4.4 数据查询
1. SSB query
bin/benchmark.sh -p -d ssb
bin/benchmark.sh -p -d ssb-flat
2. 低基数query
bin/benchmark.sh -p -d ssb-low_cardinality