StarRocks v1.10 vs Clickhouse 20.13 SSB性能测试对比报告
本文发表于: &{ new Date(1631030400000).toLocaleDateString() }
一、测试方法和结论
Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集(来源论文),通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标。Clickhouse 通过改写SSB,将星型模型打平转化成宽表,改造成了一个单表测试benchmark(参考链接)。本报告记录了StarRocks、Apache Doris和Clickhouse在SSB单表和多表数据集上进行了性能对比测试的结果。测试结论如下:
- 在单表测试的13个查询中,Apache Doris耗时最长;有9个查询StarRocks查询速度最快,另外4个查询Clickhouse速度最快。
- 单表测试中,13个查询Clickhouse总耗时是StarRocks的1.33倍。
- 在多表测试的13个查询中,StarRocks平均比Apache Doris快6.8倍,部分查询快10倍以上。
(注:由于Clickhouse对多表Join支持有限,所以在多表测试中并未将Clickhouse加入测试)
以下是具体测试结果。
二、测试准备
2.1 硬件环境
机器 | 3台 阿里云主机 |
CPU | 16core Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz cache size : 36608 KB |
内存 | 64GB |
网络带宽 | 10Gbits/s |
磁盘 | ESSD高效云盘 |
2.2 软件环境
StarRocks,Apache Doris和Clickhouse部署在相同的机器上分别进行启动测试,StarRocks和Apache Doris部署3BE 1FE,其中FE和一台BE混合部署,Clickhouse部署三个节点后建立分布式表。
内核版本:Linux 3.10.0-1127.13.1.el7.x86_64
操作系统版本:CentOS Linux release 7.8.2003
软件版本:StarRocks 1.10,Aapche Doris 0.13,Clickhouse 20.13
三、测试数据与结果
3.1 测试数据
表名 | 行数 | 解释 |
lineorder | 6亿 | SSB商品订单表 |
customer | 300万 | SSB客户表 |
part | 140万 | SSB 零部件表 |
supplier | 20万 | SSB 供应商表 |
dates | 2556 | 日期表 |
lineorder_flat | 6亿 | SSB打平后的宽表 |
3.2 测试SQL
单表测试SQL
--Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS `revenue`
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
--Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
--Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
--Q2.1
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.2
SELECT
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q2.3
SELECT sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;
--Q3.1
SELECT c_nation, s_nation, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;
--Q3.2
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.3
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q3.4
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;
--Q4.1
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;
--Q4.2
SELECT year(lo_orderdate) AS year,
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;
--Q4.3
SELECT year(lo_orderdate) AS year, s_city, p_brand,
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;
多表测试SQL
--Q1.1
select sum(lo_revenue) as revenue
from lineorder join dates on lo_orderdate = d_datekey
where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
--Q1.2
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;
--Q1.3
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;
--Q2.1
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
inner join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.2
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;
--Q2.3
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;
--Q3.1
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, lo_revenue desc;
--Q3.2
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.3
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q3.4
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth
= 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
--Q4.1
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;
--Q4.2
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;
--Q4.3
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;
3.3 测试结果
单表测试结果如下:
SQL | StarRocks 用时(ms) | Apache Doris 用时(ms) | Clickhouse用时(ms) | StarRocks/Clickhouse性能比例 |
Q1.1 | 56 | 276 | 101 | 1.80 |
Q1.2 | 20 | 39 | 22 | 1.10 |
Q1.3 | 37 | 166 | 15 | 0.41 |
Q2.1 | 296 | 1046 | 519 | 1.75 |
Q2.2 | 468 | 1049 | 429 | 0.92 |
Q2.3 | 207 | 932 | 387 | 1.87 |
Q3.1 | 398 | 1496 | 586 | 1.47 |
Q3.2 | 250 | 873 | 297 | 1.19 |
Q3.3 | 180 | 861 | 177 | 0.98 |
Q3.4 | 17 | 61 | 16 | 0.94 |
Q4.1 | 438 | 1492 | 680 | 1.55 |
Q4.2 | 158 | 520 | 208 | 1.32 |
Q4.3 | 108 | 410 | 144 | 1.33 |
多表测试结果如下:
SQL | StarRocks 用时(ms) | Apache Doris 用时(ms) | StarRocks/Apache Doris 性能比例 |
Q1.1 | 71 | 267 | 3.76 |
Q1.2 | 30 | 51 | 1.70 |
Q1.3 | 25 | 35 | 1.40 |
Q2.1 | 457 | 4743 | 10.38 |
Q2.2 | 354 | 3997 | 11.29 |
Q2.3 | 322 | 3664 | 11.38 |
Q3.1 | 826 | 5419 | 6.56 |
Q3.2 | 395 | 4108 | 10.40 |
Q3.3 | 334 | 3710 | 11.11 |
Q3.4 | 62 | 159 | 2.56 |
Q4.1 | 999 | 6645 | 6.65 |
Q4.2 | 445 | 2226 | 5.00 |
Q4.3 | 261 | 1846 | 7.07 |
四、测试流程
Clickhouse的建表导入参考官方文档,StarRocks的数据生成导入流程如下:
4.1 数据生成
首先下载ssb-poc工具包并编译
wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.3.zip
unzip ssb-poc-0.9.3.zip
cd ssb-poc
make && make install
所有相关工具安装到output目录。
进入output目录,生成数据
# 生成100G数据脚本
cd output
bin/gen-ssb.sh 100 data_dir
# 生成1T数据脚本
cd output
bin/gen-ssb.sh 1000 data_dir
4.2 创建表结构
修改配置文件conf/starrocks.conf,指定脚本操作的集群地址
# for mysql cmd
mysql_host: 192.168.1.1
mysql_port: 9030
mysql_user: root
mysql_password:
database: ssb
# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000
# parallel_fragment_exec_instance_num 设置并行度,建议是每个集群节点逻辑核数的一半,以下以8为例
parallel_num: 8
...
执行脚本建表
# 测试100G数据
bin/create_db_table.sh ddl_100
# 测试1T数据
bin/create_db_table.sh ddl_1000
以下为"lineorder_flat"表建表语句。在上一步脚本中已经创建"lineorder_flat"表,并进行了默认分桶数配置。您可以删除该表,然后根据集群规模节点配置重新规划分桶数再进行创建,可实现更好测试效果。
# 测试数据量级是100G时
CREATE TABLE `lineorder_flat` (
`lo_orderdate` date NOT NULL COMMENT "",
`lo_orderkey` int(11) NOT NULL COMMENT "",
`lo_linenumber` tinyint(4) NOT NULL COMMENT "",
`lo_custkey` int(11) NOT NULL COMMENT "",
`lo_partkey` int(11) NOT NULL COMMENT "",
`lo_suppkey` int(11) NOT NULL COMMENT "",
`lo_orderpriority` varchar(100) NOT NULL COMMENT "",
`lo_shippriority` tinyint(4) NOT NULL COMMENT "",
`lo_quantity` tinyint(4) NOT NULL COMMENT "",
`lo_extendedprice` int(11) NOT NULL COMMENT "",
`lo_ordtotalprice` int(11) NOT NULL COMMENT "",
`lo_discount` tinyint(4) NOT NULL COMMENT "",
`lo_revenue` int(11) NOT NULL COMMENT "",
`lo_supplycost` int(11) NOT NULL COMMENT "",
`lo_tax` tinyint(4) NOT NULL COMMENT "",
`lo_commitdate` date NOT NULL COMMENT "",
`lo_shipmode` varchar(100) NOT NULL COMMENT "",
`c_name` varchar(100) NOT NULL COMMENT "",
`c_address` varchar(100) NOT NULL COMMENT "",
`c_city` varchar(100) NOT NULL COMMENT "",
`c_nation` varchar(100) NOT NULL COMMENT "",
`c_region` varchar(100) NOT NULL COMMENT "",
`c_phone` varchar(100) NOT NULL COMMENT "",
`c_mktsegment` varchar(100) NOT NULL COMMENT "",
`s_region` varchar(100) NOT NULL COMMENT "",
`s_nation` varchar(100) NOT NULL COMMENT "",
`s_city` varchar(100) NOT NULL COMMENT "",
`s_name` varchar(100) NOT NULL COMMENT "",
`s_address` varchar(100) NOT NULL COMMENT "",
`s_phone` varchar(100) NOT NULL COMMENT "",
`p_name` varchar(100) NOT NULL COMMENT "",
`p_mfgr` varchar(100) NOT NULL COMMENT "",
`p_category` varchar(100) NOT NULL COMMENT "",
`p_brand` varchar(100) NOT NULL COMMENT "",
`p_color` varchar(100) NOT NULL COMMENT "",
`p_type` varchar(100) NOT NULL COMMENT "",
`p_size` tinyint(4) NOT NULL COMMENT "",
`p_container` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderdate`, `lo_orderkey`)
COMMENT "OLAP"
PARTITION BY RANGE(`lo_orderdate`)
(START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR))
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
# 测试数据量级是1T时
CREATE TABLE `lineorder_flat` (
`LO_ORDERDATE` date NOT NULL COMMENT "",
`LO_ORDERKEY` bigint(20) NOT NULL COMMENT "",
`LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
`LO_CUSTKEY` int(11) NOT NULL COMMENT "",
`LO_PARTKEY` int(11) NOT NULL COMMENT "",
`LO_SUPPKEY` int(11) NOT NULL COMMENT "",
`LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
`LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
`LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
`LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
`LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
`LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
`LO_REVENUE` int(11) NOT NULL COMMENT "",
`LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
`LO_TAX` tinyint(4) NOT NULL COMMENT "",
`LO_COMMITDATE` date NOT NULL COMMENT "",
`LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
`C_NAME` varchar(100) NOT NULL COMMENT "",
`C_ADDRESS` varchar(100) NOT NULL COMMENT "",
`C_CITY` varchar(100) NOT NULL COMMENT "",
`C_NATION` varchar(100) NOT NULL COMMENT "",
`C_REGION` varchar(100) NOT NULL COMMENT "",
`C_PHONE` varchar(100) NOT NULL COMMENT "",
`C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
`S_NAME` varchar(100) NOT NULL COMMENT "",
`S_ADDRESS` varchar(100) NOT NULL COMMENT "",
`S_CITY` varchar(100) NOT NULL COMMENT "",
`S_NATION` varchar(100) NOT NULL COMMENT "",
`S_REGION` varchar(100) NOT NULL COMMENT "",
`S_PHONE` varchar(100) NOT NULL COMMENT "",
`P_NAME` varchar(100) NOT NULL COMMENT "",
`P_MFGR` varchar(100) NOT NULL COMMENT "",
`P_CATEGORY` varchar(100) NOT NULL COMMENT "",
`P_BRAND` varchar(100) NOT NULL COMMENT "",
`P_COLOR` varchar(100) NOT NULL COMMENT "",
`P_TYPE` varchar(100) NOT NULL COMMENT "",
`P_SIZE` tinyint(4) NOT NULL COMMENT "",
`P_CONTAINER` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERDATE`, `LO_ORDERKEY`)
COMMENT "OLAP"
PARTITION BY RANGE(`LO_ORDERDATE`)
(START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR))
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 120
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
4.3 数据导入
使用Stream load导入单表数据
bin/stream_load.sh data_dir
插入数据到宽表lineorder_flat
bin/flat_insert.sh
4.4 数据查询
首先在客户端执行命令,修改并行度(类似clickhouse set max_threads= 8)
# 设置并行度,建议是每个集群节点逻辑核数的一半,以下以8为例
set global parallel_fragment_exec_instance_num = 8;
然后执行SQL
bin/benchmark.sh -p -d ssb
bin/benchmark.sh -p -d ssb-flat