When I set up Grafana Alloy across my homelab Kubernetes cluster, the first question was: how many instances do I actually need? Most tutorials show a single Alloy deployment handling everything. That works for a proof of concept but it papers over a real architectural question — one that comes down to a single word: clustering.
My setup runs three separate Alloy deployments inside Kubernetes, plus standalone Alloy on bare-metal nodes outside the cluster. The reasons for the split are not aesthetic. The primary driver is clustering — some collection tasks need it and others must not use it. The secondary driver is resource isolation: if alloy-cluster starts OOMing under a burst of ServiceMonitor scrapes, I do not want host metrics collection to stop. Keeping them separate means a problem in one deployment cannot starve the others.
What Alloy Clustering Actually Does Link to heading
Alloy clustering uses a gossip protocol to form a peer mesh between instances. When a component like prometheus.operator.servicemonitors has clustering { enabled = true }, all Alloy instances in the cluster share a hash ring and each one independently computes which targets it owns. The result is that a set of N replicas collectively scrapes all targets, with each target scraped exactly once.
Peer discovery works via DNS against a Kubernetes headless Service — which is why clustering requires a StatefulSet. StatefulSets give pods stable DNS identities; Deployments and DaemonSets do not.
This is enormously useful when you want high-availability metrics collection with multiple replicas. Without clustering, N replicas each scrape all targets independently, producing N× duplicate time series and out-of-order sample errors downstream.
Why DaemonSet Pods Do Not Need Clustering Link to heading
Here is the thing: DaemonSet pods are already partitioned by Kubernetes. There is one pod per node. Each pod only ever scrapes resources local to its own node — the host filesystem, the local kubelet endpoint, the local cAdvisor endpoint. There is no shared pool of targets to distribute.
Enabling clustering on a DaemonSet would achieve nothing. The Grafana Alloy clustering docs are blunt about this:
“A particularly common mistake is enabling clustering on logs collecting DaemonSets. Collecting logs from Pods on the mounted node doesn’t benefit from having clustering enabled since each instance typically collects logs only from Pods on its own node.”
Each DaemonSet pod is an entirely independent instance. The work partitioning is handled by Kubernetes, not by Alloy’s gossip protocol.
The Three Deployments Link to heading
This is where the topology comes from. Tasks divide into three categories: inherently node-local (DaemonSet, no clustering), cluster-wide with HA (StatefulSet, clustering on), and singleton (single Deployment, no clustering). Running one Alloy that tries to do all three would either require clustering on things that don’t need it, or no clustering on things that do. It would also mean a single resource budget covering everything — one OOM kill and both host metrics and cluster-wide scraping go down together.
alloy-node — DaemonSet, no clustering Link to heading
One pod per node. Tolerates all taints so it runs on control plane nodes too. Runs with hostNetwork: true and hostPID: true so it can see the host’s process tree and network interfaces.
What it collects:
- Host metrics via the built-in
prometheus.exporter.unix— Alloy’s native node_exporter. No separate binary needed. - cAdvisor metrics (container CPU/memory) scraped from the local kubelet endpoint only. The key is filtering discovery results to the local node using
constants.hostname, so each pod only scrapes itself:
discovery.relabel "local_node_only_cadvisor" {
targets = discovery.kubernetes.nodes.targets
rule {
source_labels = ["__meta_kubernetes_node_name"]
action = "keep"
regex = constants.hostname
}
}
Without this filter, every DaemonSet pod would attempt to scrape every node’s cAdvisor — 7 pods × 7 nodes = 49 scrape attempts for what should be 7.
- kubelet metrics using the same local-node-only pattern.
- Pod logs from
/var/log/pods/**/*.logwith CRI parsing and label extraction from the file path (namespace, pod name, container name). - systemd journal logs via
loki.source.journal— picks up kubelet, containerd, and any other systemd units on the host.
No clustering block anywhere in this config. Each pod runs entirely independently.
alloy-cluster — StatefulSet, clustering on Link to heading
Two replicas with clustering.enabled: true. This is for cluster-wide metric collection — anything that requires Kubernetes API access to discover targets and that would produce duplicates if scraped by multiple independent instances.
prometheus.operator.servicemonitors "services" {
forward_to = [prometheus.remote_write.default.receiver]
clustering {
enabled = true
}
}
prometheus.operator.podmonitors "pods" {
forward_to = [prometheus.remote_write.default.receiver]
clustering {
enabled = true
}
}
The two replicas share the ServiceMonitor and PodMonitor workload via the hash ring. If one goes down, the other takes the full load. When it comes back, targets are automatically rebalanced.
It also handles Mimir rule synchronisation — reading PrometheusRule CRDs from Kubernetes and syncing them into Mimir’s ruler:
mimir.rules.kubernetes "local" {
address = "http://mimir-ruler.mimir-system.svc.cluster.local:8080"
tenant_id = "1"
}
And an OTLP receiver for anything that wants to push telemetry in OpenTelemetry format, exposed via a regular Service backed by both replicas.
The StatefulSet uses 1Gi persistent storage (Rook/Ceph SSD) for Alloy’s write-ahead log, which buffers data locally if Mimir or Loki are temporarily unavailable.
alloy-kube-events — single Deployment, no clustering Link to heading
Kubernetes events exist only in the API server and are garbage collected after a short window. This deployment runs a single replica that watches the events API continuously and ships everything to Loki:
loki.source.kubernetes_events "kubernetes_events" {
job_name = "integrations/kubernetes/eventhandler"
log_format = "json"
forward_to = [loki.process.kubernetes_events.receiver]
}
A single replica is correct here. Events are cluster-wide objects, not node-local, so a DaemonSet would forward each event from every node. But you also do not need or want clustering — a second replica with clustering enabled would not help, and a second unclustered replica would duplicate all events. One instance, watching the API, is the right answer.
Very lightweight: 50m CPU request, 128Mi RAM.
Cardinality: An Ongoing Job Link to heading
Getting the topology right is the structural problem. Cardinality is the operational one. In a Kubernetes cluster with many pods and containers, the default label sets from node_exporter and cAdvisor generate an enormous number of time series — and Mimir has to store all of them. Left unchecked, this drives up memory usage across the whole observability stack.
The approach is the same everywhere: drop labels and metrics you will never query, as close to the source as possible.
Virtual network interfaces Link to heading
A Kubernetes node running many pods will have hundreds of virtual network interfaces — one veth pair per pod, plus Calico (cali*) interfaces. The node_exporter netclass and netdev collectors would create a separate set of time series for every one of them. They are not useful for node-level network monitoring:
prometheus.exporter.unix "node_exporter_metrics" {
netclass {
ignored_devices = "^(veth.*|cali.*|[a-f0-9]{15})$"
}
netdev {
device_exclude = "^(veth.*|cali.*|[a-f0-9]{15})$"
}
}
The regex also catches 15-character hex strings — the interface names Kubernetes generates for container network namespaces. Without this, every pod churn event adds and then expires a batch of time series.
Container and virtual filesystems Link to heading
A Kubernetes node also mounts a huge number of ephemeral filesystems: one overlay mount per container layer, tmpfs for secrets and service account tokens, cgroup hierarchies, proc, devtmpfs, and so on. None of these are useful for disk space monitoring:
filesystem {
fs_types_exclude = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
mount_points_exclude = "^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)"
}
Dropping unused collectors entirely Link to heading
The ipvs collector is disabled outright — the cluster uses iptables, not IPVS. There is no point scraping metrics for something that is not running:
prometheus.exporter.unix "node_exporter_metrics" {
disable_collectors = ["ipvs"]
}
cAdvisor container labels Link to heading
cAdvisor attaches id and name labels to container metrics. The id label is the full container runtime ID — a long hex string that is unique per container instance and changes every time a pod restarts. Keeping it would mean every pod restart permanently adds a new set of time series that never get reused:
prometheus.relabel "drop_cadvisor" {
rule {
action = "labeldrop"
regex = "id|name|instance"
}
}
Pod log stream labels Link to heading
The filename label on pod logs is the full path on the host: /var/log/pods/<namespace>_<pod-name>_<pod-uid>/<container>/<n>.log. The pod UID component is unique per pod instance, so every pod restart creates a new log stream label value that Loki has to index. Dropping it keeps the stream cardinality manageable:
stage.label_drop {
values = ["filename", "flags"]
}
The useful labels — namespace, pod, container, stream — are extracted separately from the file path via a regex stage and kept. Only the high-cardinality junk is dropped.
Internal scrape metrics Link to heading
node_exporter emits node_scrape_collector_* metrics that track its own internal scrape performance per collector. Useful for debugging node_exporter itself, but not worth storing long-term in Mimir:
rule {
source_labels = ["__name__"]
regex = "node_scrape_collector_.+"
action = "drop"
}
This is all incremental work. As new exporters get added or existing dashboards evolve, there are always new labels to audit and unused metrics to prune. The cardinality pressure does not go away — it just needs to be managed continuously.
Standalone Alloy on Bare-Metal Nodes Link to heading
Not everything in the lab runs inside Kubernetes. Proxmox hypervisors, the VyOS router, and Raspberry Pi systems all run standalone Alloy as a systemd service. The config is simpler — no pod log collection, no Kubernetes API access, just host metrics and journal logs forwarded to the same Mimir and Loki endpoints as the cluster:
prometheus.exporter.unix "node" { }
prometheus.scrape "node" {
targets = prometheus.exporter.unix.node.targets
forward_to = [prometheus.remote_write.default.receiver]
}
loki.source.journal "journal" {
forward_to = [loki.write.default.receiver]
labels = {
job = "integrations/systemd-journal",
instance = constants.hostname,
}
}
The same datacentre and cluster external labels are set on the remote_write and loki.write blocks, so in Grafana I can use a single dashboard and filter between Kubernetes nodes and bare-metal hosts.
Summary Link to heading
graph LR
subgraph bare["Bare-metal nodes"]
BM["alloy (systemd)
Proxmox · VyOS · Raspberry Pi"]
end
subgraph k8s["Kubernetes Cluster — lab-lon1-uk"]
subgraph ds["DaemonSet — 1 per node"]
AN["alloy-node
Host metrics · cAdvisor
Kubelet · Pod logs
Journal logs"]
end
subgraph sts["StatefulSet ×2, clustering on"]
AC["alloy-cluster
ServiceMonitors · PodMonitors
Mimir rules sync · OTLP receiver"]
end
subgraph dep["Deployment ×1"]
AE["alloy-kube-events
Kubernetes events"]
end
end
subgraph obs["Observability backends"]
MIMIR[("Mimir")]
LOKI[("Loki")]
GRAFANA["Grafana"]
end
AN -->|metrics| MIMIR
AN -->|logs| LOKI
AC -->|metrics| MIMIR
AC -->|logs| LOKI
AE -->|logs| LOKI
BM -->|metrics| MIMIR
BM -->|logs| LOKI
MIMIR --> GRAFANA
LOKI --> GRAFANA| Deployment | Type | Clustering | Why |
|---|---|---|---|
alloy-node | DaemonSet | No | Node-local collection — Kubernetes already partitions by node |
alloy-cluster | StatefulSet (×2) | Yes | Cluster-wide ServiceMonitor/PodMonitor scraping — needs HA without duplicates |
alloy-kube-events | Deployment (×1) | No | Single-instance by design — duplicate event forwarding would be wrong |
| Standalone | systemd | n/a | Bare-metal hosts outside the cluster |
The organising principle is clustering, not aesthetics. If a task is node-local, a DaemonSet handles partitioning naturally. If a task is cluster-wide and you want more than one replica, clustering is what prevents duplicate data. And if a task must run exactly once, you use a single Deployment and keep clustering out of the picture entirely.
This Is Also What Grafana Does Link to heading
It is worth noting that Grafana’s own k8s-monitoring Helm chart arrives at the same topology. Their chart deploys:
alloy-metrics— StatefulSet, for cluster-wide metrics collectionalloy-logs— DaemonSet, for node-local pod and host log collectionalloy-singleton— single Deployment, for cluster events and other once-only tasks
The names are different and the internals diverge — their chart is considerably more opinionated, with its own abstraction layer over the raw Alloy config — but the underlying reasoning is identical.
My current setup rolls its own Helm releases and Alloy configs directly. I plan to migrate to the k8s-monitoring chart, which also brings in the Alloy Operator for lifecycle management of the collector instances. When that migration happens I will write it up.