QAN issues¶
This section focuses on problems with QAN, such as queries not being retrieved.
Missing data¶
Why don’t I see any query-related information?¶
There might be multiple places where the problem might come from:
- connection problem between pmm-agent and pmm-managed
- PMM-agent cannot connect to the database.
- data source is not properly configured.
Why don’t I see the whole query?¶
Long query examples and fingerprints are truncated to 2048 symbols by default to reduce space usage. In this case, the Explain section will not work. Max query size can be configured using flag --max-query-length while adding a service.
Incorrect metrics: unrealistic query execution times¶
If you’re seeing query execution times that seem impossible (like 50,000+ seconds for simple SELECT statements), this is typically caused by metric calculation errors rather than actual performance issues.
This is because enabling query plans causes pg_stat_monitor to create multiple records for each query, leading to incorrect timing calculations.
To fix the issue, disable query plan collection:
-- Check if query plan collection is enabled
SHOW pg_stat_monitor.pgsm_enable_query_plan;
-- If it shows 'on', disable it
ALTER SYSTEM SET pg_stat_monitor.pgsm_enable_query_plan = off;
SELECT pg_reload_conf();
-- Verify the change took effect
SHOW pg_stat_monitor.pgsm_enable_query_plan;
After disabling query plan collection, new metrics should show realistic execution times within minutes.
QAN service fails after upgrade¶
After upgrading PMM Server, the QAN service may fail to start with BACKOFF, FATAL, or EXITED status, preventing the QAN dashboard from loading. You’ll see the following error in /srv/logs/qan-api2.log, where x is the migration version number:
stdlog: Migrations: Dirty database version x. Fix and force version.
This happens when the ClickHouse schema migration is interrupted during the upgrade.
Resolution¶
- PMM 3.5.0 and later: The issue is fixed automatically. PMM detects and completes the interrupted schema migration upon restart.
-
Earlier versions: Use the following manual workaround:
-
Access the PMM container:
podman exec -it pmm-server /bin/bash -
Connect to ClickHouse:
clickhouse client --username=<clickhouse_user> --password=<clickhouse_password> -d pmm -
Fix the migration state. Replace
xwith the version number from your error logs.USE pmm; INSERT INTO schema_migrations (version, dirty, sequence) VALUES (x, 0, toUnixTimestamp(NOW())*1000000000); EXIT; -
Restart QAN:
supervisorctl restart qan-api2 -
Verify QAN is running:
supervisorctl status qan-api2
-
ClickHouse memory issues in low-memory environments¶
If you’re running PMM Server with less than 16 GB RAM and seeing “memory limit exceeded” errors in ClickHouse logs, switch to the low-memory configuration.
PMM includes two ClickHouse profiles:
- default: optimized for performance (16 GB+ RAM)
- low-memory: optimized for constrained environments, based on ClickHouse recommendations
Switch to low-memory configuration¶
Select the profile with the PMM_CLICKHOUSE_CONFIG environment variable when you create the container:
docker run -e PMM_CLICKHOUSE_CONFIG=low-memory ... percona/pmm-server:3
Configuration details
Both configuration files are located in /etc/clickhouse-server/ inside the PMM Server container:
default-config.xml: default profilelow-memory-config.xml: low-memory profile
The switch-config.sh script is deprecated and will be removed in a future PMM release; use PMM_CLICKHOUSE_CONFIG instead.