Run a Validator
Monitoring

Monitoring Your Validator

GenLayer validators expose comprehensive metrics that are ready for consumption by Prometheus and other monitoring tools. This allows you to monitor your validator's performance, health, and resource usage.

Accessing Metrics

The metrics endpoint is exposed on the operations port (default: 9153) configured in your config.yaml:

node:
  ops:
    port: 9153 # Metrics port
    endpoints:
      metrics: true # Enable metrics endpoint

Once your node is running, you can access the metrics at:

http://localhost:9153/metrics

Available Metrics

The validator exposes various metric collectors that can be individually configured:

  • Node Metrics: Core validator performance metrics including block processing, transaction handling, and consensus participation
  • GenVM Metrics: Virtual machine performance metrics, including execution times and resource usage
  • WebDriver Metrics: Metrics related to web access and external data fetching

Configuring Metrics Collection

You can customize metrics collection in your config.yaml:

metrics:
  interval: "15s"  # Default collection interval
  collectors:
    node:
      enabled: true
      interval: "30s"  # Override interval for specific collector
    genvm:
      enabled: true
      interval: "20s"
    webdriver:
      enabled: true
      interval: "60s"

Example Metrics Query

To check if metrics are working correctly:

# Get all available metrics
curl http://localhost:9153/metrics
 
# Check specific metric (example)
curl -s http://localhost:9153/metrics | grep genlayer_node_

The metrics endpoint also provides /health and /balance endpoints on the same port for additional monitoring capabilities.

Monitoring Best Practices

  1. Set up alerts for critical metrics like node synchronization status and missed blocks
  2. Monitor resource usage to ensure your validator has sufficient CPU, memory, and disk space
  3. Track GenVM performance to optimize LLM provider selection and configuration
  4. Use visualization tools like Grafana to create dashboards for easy monitoring
📊

For production validators, we recommend setting up a complete monitoring stack with Prometheus and Grafana. This enables real-time visibility into your validator's performance and helps identify issues before they impact your validator's operation.

Logs and Metrics Forwarding

You can forward your logs and metrics to external systems for centralized monitoring and alerting by using the service alloy provided in the docker-compose.yaml from the extracted tarball.

Centralized Push to GenLayer Foundation Grafana Cloud (using Alloy)

To contribute your node's metrics and logs to the centralized GenLayer Foundation Grafana Cloud dashboard (improving aggregate network visibility, alerts, and community monitoring), use the built-in Alloy service.

Why contribute?

  • Helps the Foundation and community track overall testnet health (validator participation, latency, resource usage).
  • May positively influence testnet points/rewards (visible healthy nodes are prioritized).
  • Setup takes 10–15 minutes once credentials are provided.

Prerequisites

  • Metrics enabled in config.yaml (endpoints.metrics: true — default in recent versions).
  • Ops port 9153 exposed in docker-compose (ports: - "9153:9153").
  • Credentials from the Foundation team (ask in #testnet-asimov):
    • CENTRAL_MONITORING_URL — Prometheus remote write base URL (e.g., https://prometheus-prod-XX.grafana.net)
    • CENTRAL_LOKI_URL — Loki push base URL (e.g., https://logs-prod-XX.grafana.net)
    • MONITORING_USERNAME — Instance ID (a number)
    • MONITORING_PASSWORD — Grafana Cloud API Key (with write permissions for metrics and logs)

Steps

  1. Create or update .env (next to your docker-compose.yaml):
# Grafana Cloud credentials (request from Foundation team in Discord)
CENTRAL_MONITORING_URL=https://prometheus-prod-...grafana.net
CENTRAL_LOKI_URL=https://logs-prod-...grafana.net
MONITORING_USERNAME=1234567890          # your instance ID
MONITORING_PASSWORD=glc_xxxxxxxxxxxxxxxxxxxxxxxxxxxx  # API key

# Your node labels (customize for easy filtering in dashboards)
NODE_ID=0xYourValidatorAddressOrCustomID
VALIDATOR_NAME=validatorname

# Usually defaults are fine
NODE_METRICS_ENDPOINT=localhost:9153
LOG_FILE_PATTERN=/var/log/genlayer/node*.log
METRICS_SCRAPE_INTERVAL=15s
  1. **Add or verify the Alloy service in docker-compose.yaml (copy if missing):
alloy:
  image: grafana/alloy:latest
  container_name: genlayer-node-alloy
  command:
    - run
    - /etc/alloy/config.river
    - --server.http.listen-addr=0.0.0.0:12345
    - --storage.path=/var/lib/alloy/data
  volumes:
    - ./alloy-config.river:/etc/alloy/config.river:ro
    - ${NODE_LOGS_PATH:-./data/node/logs}:/var/log/genlayer:ro
    - alloy_data:/var/lib/alloy
  environment:
    - CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL}
    - CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL}
    - MONITORING_USERNAME=${MONITORING_USERNAME}
    - MONITORING_PASSWORD=${MONITORING_PASSWORD}
    - NODE_ID=${NODE_ID}
    - VALIDATOR_NAME=${VALIDATOR_NAME}
    - NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT}
    - SCRAPE_TARGETS_JSON=${SCRAPE_TARGETS_JSON:-}
    - METRICS_SCRAPE_INTERVAL=${METRICS_SCRAPE_INTERVAL:-15s}
    - METRICS_SCRAPE_TIMEOUT=${METRICS_SCRAPE_TIMEOUT:-10s}
    - ALLOY_SELF_MONITORING_INTERVAL=${ALLOY_SELF_MONITORING_INTERVAL:-60s}
    - LOG_FILE_PATTERN=${LOG_FILE_PATTERN:-/var/log/genlayer/node*.log}
    - LOKI_BATCH_SIZE=${LOKI_BATCH_SIZE:-1MiB}
    - LOKI_BATCH_WAIT=${LOKI_BATCH_WAIT:-1s}
  ports:
    - "12345:12345"  # Alloy UI for debugging
  restart: unless-stopped
  profiles:
    - monitoring
 
volumes:
  alloy_data:
  1. Create or update ./alloy-config.river (use the provided version — it handles logs and metrics forwarding):
// Grafana Alloy Configuration for GenLayer Node Telemetry
// Handles both log collection and metrics forwarding

// ==========================================
// Log Collection and Forwarding
// ==========================================

local.file_match "genlayer_logs" {
  path_targets = [{
    __path__ = coalesce(env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
  }]
}

discovery.relabel "add_labels" {
  targets = local.file_match.genlayer_logs.targets

  rule {
    target_label = "instance"
    replacement  = env("NODE_ID")
  }

  rule {
    target_label = "validator_name"
    replacement  = env("VALIDATOR_NAME")
  }

  rule {
    target_label = "component"
    replacement  = "alloy"
  }

  rule {
    target_label = "job"
    replacement  = "genlayer-node"
  }
}

loki.source.file "genlayer" {
  targets    = discovery.relabel.add_labels.output
  forward_to = [loki.write.central.receiver]
  tail_from_end = true
}

loki.write "central" {
  endpoint {
    url = env("CENTRAL_LOKI_URL") + "/loki/api/v1/push"
    basic_auth {
      username = env("MONITORING_USERNAME")
      password = env("MONITORING_PASSWORD")
    }
    batch_size = coalesce(env("LOKI_BATCH_SIZE"), "1MiB")
    batch_wait = coalesce(env("LOKI_BATCH_WAIT"), "1s")
  }
}

// ==========================================
// Prometheus Metrics Collection and Forwarding
// ==========================================

prometheus.scrape "genlayer_node" {
  targets = json_decode(coalesce(env("SCRAPE_TARGETS_JSON"), format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(env("NODE_METRICS_ENDPOINT"), "localhost:9153"), coalesce(env("NODE_ID"), "local"), coalesce(env("VALIDATOR_NAME"), "default"))))
  forward_to = [prometheus.relabel.metrics.receiver]
  scrape_interval = coalesce(env("METRICS_SCRAPE_INTERVAL"), "15s")
  scrape_timeout  = coalesce(env("METRICS_SCRAPE_TIMEOUT"), "10s")
}

prometheus.relabel "metrics" {
  forward_to = [prometheus.remote_write.central.receiver]
  // Optional: filter only GenLayer metrics to save bandwidth
  // rule {
  //   source_labels = ["__name__"]
  //   regex        = "genlayer_.*"
  //   action       = "keep"
  // }
}

prometheus.remote_write "central" {
  endpoint {
    url = env("CENTRAL_MONITORING_URL") + "/api/v1/write"
    basic_auth {
      username = env("MONITORING_USERNAME")
      password = env("MONITORING_PASSWORD")
    }
    queue_config {
      capacity          = 10000
      max_shards        = 5
      max_samples_per_send = 500
      batch_send_deadline = "15s"
    }
  }
}

// ==========================================
// Alloy Self-Monitoring
// ==========================================

prometheus.exporter.self "alloy" {}

prometheus.scrape "alloy" {
  targets    = prometheus.exporter.self.alloy.targets
  forward_to = []
  scrape_interval = coalesce(env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
}
  1. Start Alloy:
docker compose --profile monitoring up -d
  1. Verify it works:
docker logs genlayer-node-alloy | grep "sent batch"
docker logs genlayer-node-alloy | grep "remote_write"

Look for messages indicating successful batch sending (no error codes like 401, 403, 500).

  • In Foundation Grafana Cloud: search for metrics with labels
    instance="${NODE_ID}" or validator_name="${VALIDATOR_NAME}"
    (example: genlayer_node_uptime_seconds{instance="0xYourID"}).

Troubleshooting

  • No local metrics:
curl http://localhost:9153/metrics

— it should return Prometheus-formatted data. Authentication errors (401/403): Double-check MONITORING_USERNAME and MONITORING_PASSWORD in .env. No data pushed: Ensure URLs in .env have no trailing slash. Help: Share Alloy logs

docker logs genlayer-node-alloy