StarRocks2.1 vs Clickhouse vs Apache Druid® SSB单表性能测试对比报告

本文发表于: &{ new Date(1644854400000).toLocaleDateString() }

1. 测试结论

Star schema benchmark(以下简称SSB)是学术界和工业界广泛使用的一个星型模型测试集(来源论文),通过这个测试集合可以方便的对比各种OLAP产品的基础性能指标。Clickhouse 通过改写SSB,将星型模型打平转化成宽表,改造成了一个单表测试benchmark(参考链接)。本报告记录了StarRocks、Apache Druid®和Clickhouse在SSB单表数据集上的性能对比结果,并记录了在用户经常碰到的低基数聚合场景下StarRocks和ClickHouse的性能对比结果。测试结论如下:

  • 在标准测试数据集的13个查询上,ClickHouse的整体查询时间是StarRocks的1.7倍,Apache Druid® 的整体查询时间是StarRocks的2.2倍。
  • 在StarRocks启用bitmap index和cache的情况下,性能更胜一筹,尤其在 Q2.2 Q2.3 Q3.3 上有显著提升。整体性能是ClickHouse的2.2倍,Apache Druid®的2.9倍。
  • 在标准测试数据集上,我们选取了一些常见的低基数聚合场景。ClickHouse的整体查询时间是StarRocks的2.26倍。

在SSB单表和用户经常碰到的低基数聚合场景下对比了StarRocks和ClickHouse的性能指标。采用3x16core 64GB内存的云主机,在6亿行的数据规模进行测试。

 

2. 测试准备

2.1 硬件环境

机器3台 阿里云主机
CPU16coreIntel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHzcache size : 36608 KB
内存64GB
网络带宽5Gbits/s
磁盘ESSD

2.2 软件环境

StarRocks, Apache Druid®和Clickhouse部署在相同配置的机器上分别进行启动测试。

  • StarRocks部署3BE 1FE;
  • Clickhouse部署三个节点后建立分布式表;
  • Apache Druid® 多一台8core的master主机,部署了Broker/Coordinator/Overlord/Router(但是测试压力不在master,影响较小可以忽略),Historical/MiddleManager混合部署在与SR,CK同等配置的主机上。

内核版本:Linux 3.10.0-1127.13.1.el7.x86_64

操作系统版本:CentOS Linux release 7.8.2003

软件版本:StarRocks 2.1,Aapche Druid 0.20.1,ClickHouse 21.9

 

3. 测试数据与结果

3.1 测试数据

表名行数解释
lineorder6亿SSB商品订单表
customer300万SSB客户表
part140万SSB 零部件表
supplier20万SSB 供应商表
dates2556日期表
lineorder_flat6亿SSB打平后的宽表

3.2 测试SQL

单表测试SQL

--Q1.1 
SELECT sum(lo_extendedprice * lo_discount) AS `revenue` 
FROM lineorder_flat 
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25; 
 
--Q1.2 
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat  
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35; 
 
--Q1.3 
SELECT sum(lo_extendedprice * lo_discount) AS revenue 
FROM lineorder_flat 
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31' 
 AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35; 
 
 
--Q2.1 
SELECT sum(lo_revenue), year(lo_orderdate) AS year,  p_brand 
FROM lineorder_flat 
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA' 
GROUP BY year,  p_brand 
ORDER BY year, p_brand; 
 
--Q2.2 
SELECT 
sum(lo_revenue), year(lo_orderdate) AS year, p_brand 
FROM lineorder_flat 
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA' 
GROUP BY year,  p_brand 
ORDER BY year, p_brand; 
  
--Q2.3 
SELECT sum(lo_revenue),  year(lo_orderdate) AS year, p_brand 
FROM lineorder_flat 
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE' 
GROUP BY  year,  p_brand 
ORDER BY year, p_brand; 
 
 
--Q3.1 
SELECT c_nation, s_nation,  year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat 
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate  >= '1992-01-01' AND lo_orderdate   <= '1997-12-31' 
GROUP BY c_nation, s_nation, year 
ORDER BY  year ASC, revenue DESC; 
 
--Q3.2 
SELECT  c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat 
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate  >= '1992-01-01' AND lo_orderdate <= '1997-12-31' 
GROUP BY c_city, s_city, year 
ORDER BY year ASC, revenue DESC; 
 
--Q3.3 
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue 
FROM lineorder_flat 
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate  >= '1992-01-01' AND lo_orderdate <= '1997-12-31' 
GROUP BY c_city, s_city, year 
ORDER BY year ASC, revenue DESC; 
 
--Q3.4 
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue 
FROM lineorder_flat 
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1',  'UNITED KI5') AND  lo_orderdate  >= '1997-12-01' AND lo_orderdate <= '1997-12-31' 
GROUP BY c_city,  s_city, year 
ORDER BY year ASC, revenue DESC; 
 
 
--Q4.1 
SELECT year(lo_orderdate) AS year, c_nation,  sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat 
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2') 
GROUP BY year, c_nation 
ORDER BY year ASC, c_nation ASC; 
 
--Q4.2 
SELECT year(lo_orderdate) AS year, 
    s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit 
FROM lineorder_flat 
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND  p_mfgr in ( 'MFGR#1' , 'MFGR#2') 
GROUP BY year, s_nation,  p_category 
ORDER BY  year ASC, s_nation ASC, p_category ASC; 
 
--Q4.3 
SELECT year(lo_orderdate) AS year, s_city, p_brand, 
    sum(lo_revenue - lo_supplycost) AS profit 
FROM lineorder_flat 
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14' 
GROUP BY  year,  s_city, p_brand 
ORDER BY year ASC,  s_city ASC,  p_brand ASC; 

单表低基数测试SQL

--Q1
select count(*),lo_shipmode from lineorder_flat group by lo_shipmode;
--Q2
select count(distinct lo_shipmode) from lineorder_flat;
--Q3
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority;
--Q4
select count(*),lo_shipmode,lo_orderpriority from lineorder_flat group by lo_shipmode,lo_orderpriority,lo_shippriority;
--Q5
select count(*),lo_shipmode,s_city from lineorder_flat group by lo_shipmode,s_city;
--Q6
select count(*) from lineorder_flat group by c_city,s_city;
--Q7
select count(*) from lineorder_flat group by lo_shipmode,lo_orderdate;
--Q8
select count(*) from lineorder_flat group by lo_orderdate,s_nation,s_region;
--Q9
select count(*) from lineorder_flat group by c_city,s_city,c_nation,s_nation;
--Q10
select count(*) from (select count(*) from lineorder_flat group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation) t;
--Q11
select count(*) from (select count(*) from lineorder_flat_distributed group by lo_shipmode,lo_orderpriority,p_category,s_nation,c_nation,p_mfgr) t;
--Q12
select count(*) from (select count(*) from lineorder_flat group by substr(lo_shipmode,2),lower(lo_orderpriority),p_category,s_nation,c_nation,s_region,p_mfgr) t;

3.3 测试结果

SSB单表测试结果

低基数聚合测试结果

 

4. 测试步骤

ClickHouse的建表导入参考官方文档,StarRocks的数据生成导入流程如下:

4.1 数据生成

首先下载ssb-poc工具包并编译

wget https://starrocks-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.10.0.zip
unzip ssb-poc-0.10.0.zip
cd ssb-poc-0.10.0
cd ssb-poc
make && make install  

所有相关工具安装到output目录。

进入output目录,生成数据

cd output
bin/gen-ssb.sh 100 data_dir

4.2 创建表结构

修改配置文件conf/starrocks.conf,指定脚本操作的集群地址

 # for mysql cmd
 mysql_host: 192.168.1.1
 mysql_port: 9030
 mysql_user: root
 mysql_password:
 database: ssb
 
# cluster ports
  http_port: 8030
  be_heartbeat_port: 9050
  broker_port: 8000

# parallel_fragment_exec_instance_num 设置并行度,建议是每个集群节点逻辑核数的一半,以下以8为例
parallel_num: 8

 ...

执行脚本建表

# 测试100G数据
 bin/create_db_table.sh ddl_100

以下为"lineorder_flat"表建表语句。在上一步脚本中已经创建"lineorder_flat"表,并进行了默认分桶数配置。您可以删除该表,然后根据集群规模节点配置重新规划分桶数再进行创建,可实现更好测试效果。

CREATE TABLE `lineorder_flat` (
  `lo_orderdate` date NOT NULL COMMENT "",
  `lo_orderkey` int(11) NOT NULL COMMENT "",
  `lo_linenumber` tinyint(4) NOT NULL COMMENT "",
  `lo_custkey` int(11) NOT NULL COMMENT "",
  `lo_partkey` int(11) NOT NULL COMMENT "",
  `lo_suppkey` int(11) NOT NULL COMMENT "",
  `lo_orderpriority` varchar(100) NOT NULL COMMENT "",
  `lo_shippriority` tinyint(4) NOT NULL COMMENT "",
  `lo_quantity` tinyint(4) NOT NULL COMMENT "",
  `lo_extendedprice` int(11) NOT NULL COMMENT "",
  `lo_ordtotalprice` int(11) NOT NULL COMMENT "",
  `lo_discount` tinyint(4) NOT NULL COMMENT "",
  `lo_revenue` int(11) NOT NULL COMMENT "",
  `lo_supplycost` int(11) NOT NULL COMMENT "",
  `lo_tax` tinyint(4) NOT NULL COMMENT "",
  `lo_commitdate` date NOT NULL COMMENT "",
  `lo_shipmode` varchar(100) NOT NULL COMMENT "",
  `c_name` varchar(100) NOT NULL COMMENT "",
  `c_address` varchar(100) NOT NULL COMMENT "",
  `c_city` varchar(100) NOT NULL COMMENT "",
  `c_nation` varchar(100) NOT NULL COMMENT "",
  `c_region` varchar(100) NOT NULL COMMENT "",
  `c_phone` varchar(100) NOT NULL COMMENT "",
  `c_mktsegment` varchar(100) NOT NULL COMMENT "",
  `s_region` varchar(100) NOT NULL COMMENT "",
  `s_nation` varchar(100) NOT NULL COMMENT "",
  `s_city` varchar(100) NOT NULL COMMENT "",
  `s_name` varchar(100) NOT NULL COMMENT "",
  `s_address` varchar(100) NOT NULL COMMENT "",
  `s_phone` varchar(100) NOT NULL COMMENT "",
  `p_name` varchar(100) NOT NULL COMMENT "",
  `p_mfgr` varchar(100) NOT NULL COMMENT "",
  `p_category` varchar(100) NOT NULL COMMENT "",
  `p_brand` varchar(100) NOT NULL COMMENT "",
  `p_color` varchar(100) NOT NULL COMMENT "",
  `p_type` varchar(100) NOT NULL COMMENT "",
  `p_size` tinyint(4) NOT NULL COMMENT "",
  `p_container` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderdate`, `lo_orderkey`)
COMMENT "OLAP"
PARTITION BY RANGE(`lo_orderdate`)
(START ("1992-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR))
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 48
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

同时修改BE的page_cache参数,并重启BE:

disable_storage_page_cache=false; -- 开启page_cache
storage_page_cache_limit=4294967296; --设置page_cache的大小

如您希望测试创建bitmap_index情况下的性能,可以进行如下操作。如您希望直接测试标准性能,请跳过此步骤进行数据导入。

对所有字符串列创建bitmap_index:

#对 lo_orderpriority、lo_shipmode、c_name、c_address、c_city、c_nation、c_region、c_phone、c_mktsegment、s_region、s_nation、s_city、s_name、s_address、s_phone、p_name、p_mfgr、p_category、p_brand、p_color、p_type、p_container 创建bitmap_index
CREATE INDEX bitmap_lo_orderpriority ON lineorder_flat (lo_orderpriority) USING BITMAP;
CREATE INDEX bitmap_lo_shipmode ON lineorder_flat (lo_shipmode) USING BITMAP;
CREATE INDEX bitmap_c_name ON lineorder_flat (c_name) USING BITMAP;
CREATE INDEX bitmap_c_address ON lineorder_flat (c_address) USING BITMAP;
CREATE INDEX bitmap_c_city ON lineorder_flat (c_city) USING BITMAP;
CREATE INDEX bitmap_c_nation ON lineorder_flat (c_nation) USING BITMAP;
CREATE INDEX bitmap_c_region ON lineorder_flat (c_region) USING BITMAP;
CREATE INDEX bitmap_c_phone ON lineorder_flat (c_phone) USING BITMAP;
CREATE INDEX bitmap_c_mktsegment ON lineorder_flat (c_mktsegment) USING BITMAP;
CREATE INDEX bitmap_s_region ON lineorder_flat (s_region) USING BITMAP;
CREATE INDEX bitmap_s_nation ON lineorder_flat (s_nation) USING BITMAP;
CREATE INDEX bitmap_s_city ON lineorder_flat (s_city) USING BITMAP;
CREATE INDEX bitmap_s_name ON lineorder_flat (s_name) USING BITMAP;
CREATE INDEX bitmap_s_address ON lineorder_flat (s_address) USING BITMAP;
CREATE INDEX bitmap_s_phone ON lineorder_flat (s_phone) USING BITMAP;
CREATE INDEX bitmap_p_name ON lineorder_flat (p_name) USING BITMAP;
CREATE INDEX bitmap_p_mfgr ON lineorder_flat (p_mfgr) USING BITMAP;
CREATE INDEX bitmap_p_category ON lineorder_flat (p_category) USING BITMAP;
CREATE INDEX bitmap_p_brand ON lineorder_flat (p_brand) USING BITMAP;
CREATE INDEX bitmap_p_color ON lineorder_flat (p_color) USING BITMAP;
CREATE INDEX bitmap_p_type ON lineorder_flat (p_type) USING BITMAP;
CREATE INDEX bitmap_p_container ON lineorder_flat (p_container) USING BITMAP;

修改BE参数并重启BE:

bitmap_max_filter_ratio=1000; 

4.3 数据导入

使用Stream load导入单表数据

bin/stream_load.sh data_dir

插入数据到宽表lineorder_flat

bin/flat_insert.sh 

4.4 数据查询

1. SSB query

bin/benchmark.sh -p -d ssb
bin/benchmark.sh -p -d ssb-flat

2. 低基数query

bin/benchmark.sh -p -d ssb-low_cardinality