A Deep Dive Into My Self-Hosted Grafana LGTM Stack

From setup to dashboards — a hands‑on guide to running Grafana, Loki, Tempo, and Mimir in a self‑hosted LGTM observability stack.

T.J. Tarazevits

26 Jun 2025 — 7 min read

After spending what felt like countless hours wrestling with configs, I've finally wrangled my self-hosted observability stack into a state I'm happy with. This is a walkthrough of my Grafana "Looks Good To Me" (LGTM) stack, built on a single TrueNAS node. It's designed to be a comprehensive solution for collecting metrics, logs, traces, and profiles from all my services.

This setup is powerful and flexible, but be warned: it is not a simple one-click install. If you're looking for a quick and easy solution, a platform like SigNoz might be a better fit. But if you're like me and enjoy the challenge of building a custom, deeply integrated system from the ground up, then let's dive in.

The Big Picture: Architecture Overview

The core of this stack consists of Grafana, Loki, Tempo, Mimir, and Pyroscope, with Grafana Alloy acting as the central nervous system. At a high level, the roles are simple:

Loki: For logs.
Mimir: For metrics.
Tempo: For traces.
Pyroscope: For continuous profiling.
Grafana: The single pane of glass to visualize it all.

The real magic, however, lies in Grafana Alloy. Alloy is Grafana's distribution of the OpenTelemetry (OTel) Collector, but with powerful additions like its own configuration language and built-in components that simplify complex data pipelines. It's responsible for receiving telemetry from various sources, processing it, and routing it to the correct storage backend.

Here's a visual representation of the data flow through Alloy, which we'll be breaking down in detail:

The Heart of the System: Grafana Alloy

Alloy's configuration is a graph of interconnected components. Data flows from receivers, through processors and connectors, and finally out to exporters.

How Data Gets In: Receivers

My stack ingests data in several ways:

1. OTLP Endpoints: Alloy exposes standard OTel gRPC and HTTP endpoints. This allows any OTel-instrumented application to send telemetry directly. I use this for a separate Raspberry Pi cluster that pushes its data over the network.

// Grafana Services/alloy/config.alloy
// OpenTelemetry grpc and http receiver ports
otelcol.receiver.otlp "otlp" {
  grpc { }
  http { }

  output {
    traces  = [otelcol.processor.resourcedetection.default.input]
    metrics = [otelcol.processor.resourcedetection.default.input]
    logs    = [otelcol.processor.resourcedetection.default.input]
  }
}

2. Prometheus Scrapes: Alloy actively scrapes Prometheus metrics from various sources:

cAdvisor: To get fine-grained container metrics (CPU, memory, network, etc.).
Manual Endpoints: I scrape the LGTM components themselves (Mimir, Tempo, Loki) to monitor the monitoring stack.
Graphite Exporter: TrueNAS exports metrics in the Graphite format. I use a dedicated graphite-exporter container to convert these to Prometheus format, which Alloy then scrapes.

// Grafana Services/alloy/config.alloy
// "Pull" metrics from specific exporters
prometheus.scrape "exporters" {
  targets = [
    {"__address__" = "graphite-exporter:9108"},
    {"__address__" = "mimir:9090"},
    {"__address__" = "tempo:3200"},
    {"__address__" = "loki:3100"},
  ]

  forward_to = [otelcol.receiver.prometheus.default.receiver]
}

3. eBPF for Automatic Instrumentation: This is one of the most powerful features. Using eBPF, Alloy can automatically trace requests and generate profiles for running containers without any code changes.

Beyla (beyla.ebpf): Sits at the Linux kernel level, observing network traffic between containers. It automatically generates traces and metrics for HTTP requests, propagating trace IDs across services.
Pyroscope (pyroscope.ebpf): Does the same for continuous profiling, capturing CPU profiles from all containers.

This is a game-changer because it provides deep visibility into third-party applications or legacy services where you can't modify the source code to add manual instrumentation.

How Data is Processed: Processors & Connectors

Once data is received, it flows through a processing pipeline:

Resource Detection: The otelcol.processor.resourcedetection component automatically attaches metadata like hostname and OS to all incoming telemetry. I have override = false set to ensure that telemetry from my Raspberry Pi cluster retains its original host information.
Span-to-Log Conversion: The otelcol.connector.spanlogs component is a neat trick. It inspects traces and generates a log entry for the root span of each trace. This means that for every incoming request captured by Beyla, I get a corresponding log line with the HTTP method, status code, and trace ID, even if the application itself doesn't log it.
Batching: Finally, the otelcol.processor.batch component groups telemetry together before sending it to the exporters. This is more efficient than sending a network request for every single log or metric, especially in a busy system.

Relabeling: For discovered Docker containers, I use discovery.relabel to clean up the container name and use it as the service_name label, which is a standard across the observability world.

// Grafana Services/alloy/config.alloy
discovery.relabel "local_containers" {
  targets = discovery.docker.linux.targets

  rule {
    action        = "replace"
    source_labels = ["__meta_docker_container_name"]
    regex         = "^/(.*)"
    replacement   = "$1"
    target_label  = "service_name"
  }
}

Component Deep Dive

Now let's look at how each backend storage component is configured to work with Alloy.

Logging with Loki

loki-config

loki-config.yaml

1 KB

Loki receives logs from two primary sources via Alloy:

Directly from container stdout: Alloy's loki.source.docker component uses the Docker socket to discover all running containers and stream their logs, just like running docker logs.
From the OTel endpoint: Any service can push structured logs directly to Alloy.

A crucial piece of configuration in Loki itself is ensuring it recognizes the service_name label we attached in Alloy. The discover_service_name block tells Loki which attributes to look for to identify the service.

# Grafana Services/loki/loki-config.yaml
limits_config:
  # ...
  discover_service_name:
    - svc
    - service_name
    - service
    - app

Grafana consolidates the log views by service and the Loki log parser color codes logs by level

Metrics with Mimir

mimir

mimir.yaml

2 KB

Mimir is my store for Prometheus-compatible metrics. Since this is a single-node home lab setup, I've disabled multi-tenancy and configured it to use the local filesystem for storage. I also bumped up some of the default limits to handle the initial burst of metrics when services restart.

cAdvisor metrics capture memory usage for every container

# Grafana Services/mimir/mimir.yaml
multitenancy_enabled: false

common:
  storage:
    backend: filesystem
    filesystem:
      dir: /data/common

limits:
  ingestion_burst_size: 1000000     # 5x the default
  max_global_series_per_user: 1000000

Tracing with Tempo & Automatic Service Graphs

tempo

tempo.yaml

1 KB

Tempo stores and queries traces. One of the neat features I've enabled is having Tempo generate its own service graphs and span metrics. I initially tried to do this within Alloy, but it was unstable and crashed frequently. The official recommendation is to let Tempo handle it, and it works well.

Automatic traces maps the calls between Scrutiny and its InfluxDB container

Tempo processes the traces it receives, generates metrics (e.g., "service A called service B," request latency, error counts), and then writes those metrics directly to Mimir.

# Grafana Services/tempo/tempo.yaml
metrics_generator:
  # ...
  storage:
    path: /data/generator/wal
    # pushes prometheus metrics to Mimir
    remote_write:
      - url: http://mimir:9090/api/v1/push
        send_exemplars: true

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks]

💡

(Exemplars are links that connect a metric data point to a sample trace that occurred at the same time, making it easy to jump from a metric spike to a relevant trace.)

This is a powerful feedback loop: traces come into Tempo, which generates metrics and sends them to Mimir, all of which can be correlated in Grafana.

The end result; eBPF does a pretty decent job of discovering service names. Some containers only show up as their underlying runtime process name. E.g node or python3.11

Profiling with Pyroscope

Pyroscope stores continuous profiling data. Thanks to the pyroscope.ebpf component in Alloy, I get CPU profiles for every running container automatically. This is incredibly useful for debugging performance issues. Grafana allows you to compare profiles from different time ranges, so you can see a diff between a "busy" period and an "idle" one to pinpoint exactly what code paths are causing high CPU usage.

A pyroscope profile of Plex during a video stream transcode vs idle

Putting it All Together: Docker Compose

docker-compose

Full LGTM docker-compose file

docker-compose.yml

4 KB

All of these services are orchestrated with a single docker-compose.yml file. The alloy service definition is the most complex, as it requires special permissions for eBPF and cAdvisor to work.

services:
  alloy:
    image: grafana/alloy:v1.9.1
    container_name: alloy
    command: ["run", "/etc/alloy/config.alloy", "--server.http.listen-addr=0.0.0.0:12345", "--stability.level=experimental"]
    privileged: true # Required for eBPF profiling
    pid: "host" # Required for eBPF profiling
    volumes:
      - /mnt/storage_pool/docker/appdata/grafana/services/alloy/config.alloy:/etc/alloy/config.alloy
      - /var/run/docker.sock:/var/run/docker.sock:ro
       # allows cAdvisor to read container metadata
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "12345:12345" # Debugging UI
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP

Key takeaways here:

--stability.level=experimental: This flag is required to enable the eBPF components.
privileged: true and pid: "host": These grant Alloy the necessary kernel-level access for eBPF.
Volume Mounts: We mount the Docker socket for container discovery and several read-only host paths for cAdvisor to access container stats.

Final Thoughts

Setting up this stack was a significant investment of time and effort. The configuration is complex, and the components have many moving parts. However, the result is an incredibly powerful and tightly integrated observability platform that gives me deep insights into everything running on my server, often with zero instrumentation effort.

For anyone who wants full control and is willing to get their hands dirty, this is a fantastic learning experience and a powerful tool. For everyone else, I'd still recommend looking at a more managed solution like SigNoz to save yourself the 40+ hours of configuration. But for me, the journey was worth it.