Skip to main content

· 2 min read

Question

When running ClickHouse in Docker, Docker is complaining about the lack of CAP_IPC_LOCK and CAP_SYS_NICE capabilities in the system. How can I resolve it?

Here is what the no CAP_SYS_NICE or CAP_SYS_NICE capability log messages look like:

docker run -d --name clickhouse-server \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
2023.04.19 08:04:10.022720 [ 1 ] {} <Information> Application: It looks like the process has no CAP_IPC_LOCK capability, binary mlock will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_ipc_lock=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

2023.04.19 08:04:10.065860 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

Answer

  1. Add two --cap-add arguments to provide the container with the IPC_LOCK and SYS_NICE capabilities:
docker run -d --name clickhouse-server \
--cap-add=SYS_NICE \
--cap-add=IPC_LOCK \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
  1. Check that the capabilities are visible in the container using the following command:
apt-get update > /dev/null && apt-get install -y libcap2-bin > /dev/null && capsh --print

The response is similar to:

debconf: delaying package configuration, since apt-utils is not installed
WARNING: libcap needs an update (cap=40 should have a name).
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap+ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
  1. Manually set both the capabilities for ClickHouse
setcap "cap_ipc_lock=+ep cap_sys_nice=+ep" /usr/bin/clickhouse
  1. Check that the capabilities are applied.
getcap -v /usr/bin/clickhouse

You should see the following:

/usr/bin/clickhouse = cap_ipc_lock,cap_sys_nice+ep
  1. Restart the ClickHouse server and the log messages should not be shown.

Check out this article on Linux capabilities for more details.

· 2 min read

The following useful query shows which of your executed queries used the most memory. A couple of comments about this query:

  • the results are computed from the past day (now() - toIntervalDay(1))) but you can easily modify the time interval
  • it assumes you have a cluster named default, which is the name of your cluster in ClickHouse Cloud. Change default to the name of your cluster
  • if you do not have a cluster, see the query listed at the end of this article
SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
clusterAllReplicas(default, system.query_log)
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

The response looks like:

┌─nb_query─┬─user────┬─query─────────────────────────────────────────────────────────┬───memory─┬─normalized_query_hash─┐
│ 11 │ default │ select version() │ 46178924 │ 7202516440347714159 │
│ 2 │ default │ SELECT * FROM "system"."table_functions" LIMIT 31 OFFSET 0 │ 8391544 │ 12830067173062987695 │
└──────────┴─────────┴───────────────────────────────────────────────────────────────┴──────────┴───────────────────────┘
Note

If you do not have a system.query_log table, then you likely do not have query logging enabled. View the details of the query_log setting for details on how to enable it.

If you do not have a cluster, use can just query your one system.query_log table directly:

SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
system.query_log
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

· One min read

We often get asked about a good schema migration tool for ClickHouse and what is the best practice to manage database schemas in ClickHouse that might change over time? There is no standard schema migration tool for ClickHouse, but we have compiled the following list (in no particular order) of automatic schema migration tools with support for ClickHouse that we know:

· One min read

Question

How do I view the number of active or queued mutations?

Answer

Monitoring the number of active or queued mutations is important if you are performing a lot of ALTER or UPDATE statements on your tables. These queries rewrite data parts and are not atomic - they are ordered by their creation part and applied to each part in that order. You can find more details on mutations in the docs.

Each mutation generates an entry in the system.mutations table. When performing a large number of mutations, you can monitor the count running and queued mutations with this:

SELECT
hostname() AS host,
count()
FROM clusterAllReplicas('default', 'system.mutations') WHERE not is_done
GROUP BY host;
Note

This query assumes you are running a cluster named default, which is the name of your cluster in ClickHouse Cloud. Replace default with the name of your cluster.

If you do not have a cluster, use this command:

SELECT
hostname() AS host,
count()
FROM system.mutations WHERE not is_done
GROUP BY host;

We also recommend reading this recent blog on updates and deletes.

· 2 min read

ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with ZooKeeper, but it might not be obvious why you should use ClickHouse Keeper instead of ZooKeeper. This article discusses some of the benefits of Keeper.

Answer

ClickHouse Cloud uses clickhouse-keeper at large scale for thousands of services in a multi-tenant environment. We designed and built Keeper so that we could remove our dependency on the Java-based ZooKeeper implementation. ClickHouse Keeper solves many well-known drawbacks of ZooKeeper and makes additional improvements, including:

  • Snapshots and logs consume much less disk space due to better compression
  • No limit on the default packet and node data size (it is 1 MB in ZooKeeper)
  • No zxid overflow issue (it forces a restart for every 2B transactions in ZooKeeper)
  • Faster recovery after network partitions due to the use of a better distributed consensus protocol
  • It uses less memory for the same volume of data
  • It is easier to setup, and it does not require specifying the JVM heap size or a custom garbage collection implementation
  • A few custom commands in the protocol enable faster operations in ReplicatedMergeTree tables
  • A larger coverage by Jepsen tests

In addition, ClickHouse Support has observed a massive decrease in cluster problems in cases with sites who use clickhouse-keeper rather than ZooKeeper.

Check out the Keeper docs page for more details on how to configure and run ClickHouse Keeper.

· 2 min read

If a column is sparse (empty or contains mostly zeros), ClickHouse can encode it in a sparse format and automatically optimize calculations - the data does not require full decompression during queries. In fact, if you know how sparse a column is, you can define its ratio using the ratio_of_defaults_for_sparse_serialization setting to optimize serialization.

This handy query can take a while, but it analyzes every row in your table and determines the ratio of values that are zero (or the default) in every column in the specified table:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM table_name
FORMAT Vertical

For example, we ran this query above on the environmental sensors dataset table named sensors which has over 20B rows and 19 columns:

SELECT *
APPLY x -> (x = defaultValueOfArgumentType(x)) APPLY avg APPLY x -> round(x, 3)
FROM sensors
FORMAT Vertical

Here is response:


Row 1:
──────
round(avg(equals(sensor_id, defaultValueOfArgumentType(sensor_id))), 3): 0
round(avg(equals(sensor_type, defaultValueOfArgumentType(sensor_type))), 3): 0.159
round(avg(equals(location, defaultValueOfArgumentType(location))), 3): 0
round(avg(equals(lat, defaultValueOfArgumentType(lat))), 3): 0.001
round(avg(equals(lon, defaultValueOfArgumentType(lon))), 3): 0.001
round(avg(equals(timestamp, defaultValueOfArgumentType(timestamp))), 3): 0
round(avg(equals(P1, defaultValueOfArgumentType(P1))), 3): 0.474
round(avg(equals(P2, defaultValueOfArgumentType(P2))), 3): 0.475
round(avg(equals(P0, defaultValueOfArgumentType(P0))), 3): 0.995
round(avg(equals(durP1, defaultValueOfArgumentType(durP1))), 3): 0.999
round(avg(equals(ratioP1, defaultValueOfArgumentType(ratioP1))), 3): 0.999
round(avg(equals(durP2, defaultValueOfArgumentType(durP2))), 3): 1
round(avg(equals(ratioP2, defaultValueOfArgumentType(ratioP2))), 3): 1
round(avg(equals(pressure, defaultValueOfArgumentType(pressure))), 3): 0.83
round(avg(equals(altitude, defaultValueOfArgumentType(altitude))), 3): 1
round(avg(equals(pressure_sealevel, defaultValueOfArgumentType(pressure_sealevel))), 3): 1
round(avg(equals(temperature, defaultValueOfArgumentType(temperature))), 3): 0.532
round(avg(equals(humidity, defaultValueOfArgumentType(humidity))), 3): 0.544

1 row in set. Elapsed: 992.041 sec. Processed 20.69 billion rows, 1.39 TB (20.86 million rows/s., 1.40 GB/s.)

From the results above:

  • the sensor_id columns is not sparse at all. In fact, every row has a non-zero value
  • the sensor_type is only sparse about 15.9% of the time
  • the P0 column is very sparse: 99.9% of the values are zero
  • the pressure column is quite sparse at 83%
  • and temperature column has 53.2% of its values missing or zero

Like we said, it's a handy query for computing how sparse your columns are in a ClickHouse table!

· 2 min read

Question

How can I validate that two queries return the same resultsets?

Answer

You can use the below approach:

WITH
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 1 here
-- SELECT ...
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
-- your query 2 here
-- SELECT ...
)
) AS q2_resultset_hash
SELECT equals(q1_resultset_hash,q2_resultset_hash) as Q1_equals_Q2

The example uses a CTE to calculate sums of the cityHash value of each row in these two queries and will return 1 if the two resultsets are identical.

Using some integers sequence data and some pretty formatting:

WITH
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number DESC
)
) AS q1_resultset_hash,
(
SELECT sum(cityHash64(*))
FROM
(
SELECT *
FROM numbers(10)
ORDER BY number ASC
)
) AS q2_resultset_hash
SELECT q1_resultset_hash = q2_resultset_hash AS Q1_equals_Q2
FORMAT Pretty

will return:

┏━━━━━━━━━━━━━━┓
┃ Q1_equals_Q2 ┃
┡━━━━━━━━━━━━━━┩
│ 1 │
└──────────────┘

While this can be handy in many scenarios, it can't be considered as a silver bullet to validate equality of resultsets for all types and there are caveats to using it, for example if any row contains NULL values the above approach will fail.

· One min read

Cause of the Error

This error occurs while trying to connect to a ClickHouse server using clickhouse-client. The cause of the error is either:

  • the client configuration file config.xml is missing the root certificate in the machine CA default store, or
  • there is a self-signed or internal CA certificate that is not configured

Solution

If using an internal or self-signed CA, configure the CA root certificate in config.xml in the client directory (e.g. /etc/clickhouse-client) and disable the loading of the default root CA certificates from the default location.

Here is an example configuration:

<openSSL>
<client>
<loadDefaultCAFile>false</loadDefaultCAFile>
<caConfig>/etc/clickhouse-server/certs/marsnet_ca.crt</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>

Additional resources

View https://clickhouse.com/docs/en/interfaces/cli/#configuration_files

· 2 min read

When this error occurs, a table shows as readonly and the error states intersecting parts. You can see the error in the logs or by

SELECT *
FROM system.replicas
WHERE is_readonly = 1

The error message looks like:

Code: 49. DB::Exception: Part XXXXX intersects previous part YYYYY. It is a bug or a result of manual intervention in the ZooKeeper data. (LOGICAL_ERROR) (version 21.12.4.1 (official build))

Cause of the Error

This error can be caused by a race condition between mergeSelectingTask and queue reinitialization.

Solution

Execute the following queries on all replicas:

DETACH TABLE table_name;  -- Required for DROP REPLICA

SYSTEM DROP REPLICA 'replica_name' FROM ZK PATH '/table_path_in_zk/'; -- It will remove everything from /table_path_in_zk

ATTACH TABLE table_name; -- Table will be in readonly mode, because there is no metadata in ZK

Then execute the following on all replicas:

SYSTEM RESTORE REPLICA table_name;  -- It will detach all partitions, re-create metadata in ZK (like it's new empty table), and then attach all partitions back

SYSTEM SYNC REPLICA table_name; -- Wait for replicas to synchronize parts. Also it's recommended to check `system.detached_parts` on all replicas after recovery is finished.
Tip

You should upgrade to the latest version of ClickHouse

Additional resources

Related PRs and GitHub issues:

Versions affected:

ClickHouse v 22.12 and prior