- ADR
- ADR-006: Metrics And Logs
Status
DoneContext
We want to collect business and infrastructure metrics from both observer and int3face node binaries.Implementation
- Implement log aggregation.
- Add metrics for the number of incoming and outgoing transactions.
- Add metrics for total vault supply.
- Add metrics for total Cosmos-represented asset supply.
- Calculate and monitor the ratio between vault and Cosmos asset supplies.
- Monitor incoming and outgoing transaction volume.
- Monitor node status, including health status and current height.
- Implement TSS metrics.
Steps to implement
- Determine how to scrape metrics from nodes: 71.
- Set up metrics storage: 72.
- Research methods to expose and modify Cosmos-based metrics: 73.
- Add business and infrastructure metrics for the Int3face node: 74.
- Add business and infrastructure metrics for the Observer node: 75.
- Deploy the metrics infrastructure to Hetzner: 76.
- Set up a metrics dashboard using the reference template: 77.
Solution
- Implemented monitoring based on Prometheus and Grafana.
- Prometheus and Grafana are deployed by docker-compose.
-
Also running as a demon:
- node-exporter to collect node metrics.
- cosmos-exporter to collect Cosmos metrics.
-
Monitoring repository: int3face-monitoring contains:
- Prometheus & Grafana configs
- Deployment scripts
- Docker-compose file
- Dashboards
-
Grafana dashboard (Hetzner): int3face-monitoring
- Login:
admin - Password: {please ask from a team member}
- Login:
- Prometheus (Hetzner): prometheus-ui
Metrics overview
Observer Node Metrics
Configuration block in observer.toml file:| Metric name | Metric Type | Labels | Description |
|---|---|---|---|
observer_transfer_success_total | Counter | [from_chain, to_chain] | Number of successful transfers |
observer_transfer_failed_total | Counter | [from_chain, to_chain] | Number of failed transfers |
observer_transfer_duration_seconds | Histogram | [from_chain, to_chain] | Time spent on processing transfer |
observer_transfers_queue_size | Gauge | [chain_id] | Size of transfers queue |
observer_transferred_amount_total | Counter | [from_chain, to_chain] | Amount of transferred assets |
observer_tss_sign_success_total | Counter | [] | Number of successful TSS signs |
observer_tss_sign_failed_total | Counter | [] | Number of failed TSS signs |
observer_tss_sign_duration_seconds | Histogram | [] | Time spent on TSS |
observer_keygen_processing_success_total | Counter | [] | Number of successful key generation processes |
observer_keygen_processing_failed_total | Counter | [] | Number of failed key generation processes |
observer_keygen_processing_duration_seconds | Histogram | [] | Time spent on key generation |
observer_vault_migration_duration_seconds | Histogram | [] | Time spent on vault migration |
observer_chain_client_health | Gauge | [chain_id] | Chain client health status |
observer_chain_height | Gauge | [chain_id] | Chain height |
observer_chain_last_observed_height | Gauge | [chain_id] | Last observed chain height |
observer_total_supply | Gauge | [chain_id, asset_id, vault_address] | Total supply of assets |
Int3face Node Metrics
Configuration block in config/config.toml file:- Consensus metrics: link to metrics
-
Cosmos-exporter metrics (all the metrics provided by cosmos-exporter have the following prefixes):
- cosmosvalidator* - metrics related to a single validator
- cosmosvalidators* - metrics related to a validator set
- cosmoswallet* - metrics related to a single wallet
Node Exporter Metrics
- Node exporter metrics: link to metrics