VictoriaLogs Beats Elasticsearch, MongoDB and PostgreSQL in ClickBench

11 points by valyala 2 days ago | 1 comment

Hi, I'm the core developer of VictoriaLogs - zero-config schemaless open-source database for logs. Recently I tried running ClickHouse benchmark on it [1]. This benchmark executes various analytical queries over a table with 100 million rows. Every row represents an ad serving event with ~100 different fields such as UserID, ClientIP, AdvEngineID and so on. You can look at the table definition at [2].

Why running analytical benchmark against database for logs? VictoriaLogs is designed as a database for logs. It supports both plaintext and structured logs. Structured logs can be written to a single table with pre-defined columns for every field seen in the structured logs. So rows with ad serving events from ClickBench can be treated as structured logs with many fields. Such logs are known also as wide events [3].

VictoriaLogs has some optimizations for wide events. For example, it stores data per each column in distinct blocks on disk (aka column-oriented storage). This allows reading data only for the needed columns during query execution. That's why I decided giving VictoriaLogs a try at ClickBench [4].

It has been appeared that VictoriaLogs shows good results there:

- it needs 4.5x less disk space than Elasticsearch and PostgreSQL, and 5x less disk space than MongoDB

- it loads data 4.7x faster than Elasticsearch and 21x faster than MongoDB

- it executes queries 9.3x faster than Elasticsearch, 111x faster than PostgreSQL and 51x faster than MongoDB on average

You can easily run and reproduce the benchmark on your hardware by cloning ClickBench [5] and executing benchmark.sh script per each database you want to test (each benchmark.sh script is located in the folder with the corresponding database name).

[1] https://benchmark.clickhouse.com/

[2] https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/create.sql

[3] https://jeremymorrell.dev/blog/a-practitioners-guide-to-wide-events/

[4] https://github.com/ClickHouse/ClickBench/tree/main/victorialogs

[5] https://github.com/ClickHouse/ClickBench

valyala 2 days ago |

Pay attention that VictoriaLogs doesn't need any configuration and table schema in this benchmark - it is enough to ingest JSON lines, where every line contains a single ad serving event [1].

Also it is interesting to compare the original SQL queries [2] to LogsQL queries [3]. LogsQL queries are shorter and they are easier to understand even if you aren't familiar with LogsQL. See how to convert SQL to LogsQL [4].

[1] https://github.com/ClickHouse/ClickBench/tree/main/victorial...

[2] https://github.com/ClickHouse/ClickBench/blob/main/clickhous...

[3] https://github.com/ClickHouse/ClickBench/blob/main/victorial...

[4] https://docs.victoriametrics.com/victorialogs/sql-to-logsql/