How Prometheus Monitoring Works: From /proc to Dashboard
When you look at a Grafana dashboard showing CPU usage, memory, network traffic - have you ever wondered where that data comes from? This guide explains the complete journey from hardware to visualization.
The Complete Pipeline
Hardware generates events
↓
Linux kernel counts events in /proc and /sys
↓
node_exporter reads kernel files every scrape
↓
node_exporter exposes metrics on :9100/metrics
↓
Prometheus scrapes HTTP endpoint every 15s
↓
Prometheus stores time-series in database
↓
Grafana queries Prometheus with PromQL
↓
Grafana renders dashboard
Let's walk through each layer.
Layer 1: The Linux Kernel
The Linux kernel tracks everything happening on your system and exposes statistics through special filesystems.
/proc - Process and System Information
The /proc filesystem is a virtual filesystem - the files don't exist on disk, they're generated on-the-fly by the kernel.
CPU statistics:
cat /proc/stat
Output:
cpu 194342 7410 463657 176838 23208 0 1196 0 0 0
cpu0 48585 1852 115914 44209 5802 0 299 0 0 0
cpu1 48589 1853 115915 44210 5802 0 299 0 0 0
cpu2 48584 1852 115914 44209 5802 0 299 0 0 0
cpu3 48584 1853 115914 44210 5802 0 299 0 0 0
Each number represents CPU time in USER_HZ (usually 1/100th of a second):
user nice system idle iowait irq softirq steal guest guest_nice
user: Time running user processesnice: Time running low-priority processessystem: Time running kernel codeidle: Time doing nothingiowait: Time waiting for I/O
Memory statistics:
cat /proc/meminfo
Output:
MemTotal: 16659512 kB
MemFree: 4234152 kB
MemAvailable: 12456784 kB
Buffers: 123456 kB
Cached: 7654321 kB
SwapTotal: 8388608 kB
SwapFree: 8388608 kB
...
Every field is a different aspect of memory usage.
Network statistics:
cat /proc/net/dev
Output:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
eth0: 123456789 1234567 0 0 0 0 0 0 98765432 987654 0 0 0 0 0 0
lo: 12345678 123456 0 0 0 0 0 0 12345678 123456 0 0 0 0 0 0
Columns show bytes/packets received and transmitted per interface.
Disk statistics:
cat /proc/diskstats
Output:
8 0 sda 12345 67890 123456 234567 45678 56789 234567 345678 0 123456 567890
8 1 sda1 1234 6789 12345 23456 4567 5678 23456 34567 0 12345 56789
Columns include reads completed, sectors read, time reading, writes completed, etc.
/sys - Hardware Monitoring
The /sys filesystem exposes hardware information.
Temperature sensors:
cat /sys/class/hwmon/hwmon0/temp1_input
Output:
45000
This is 45°C (temperatures are in millidegrees Celsius).
Fan speeds:
cat /sys/class/hwmon/hwmon0/fan1_input
Output:
2400
Fan spinning at 2400 RPM.
Layer 2: node_exporter
node_exporter is a Prometheus exporter - a program that reads system statistics and exposes them in Prometheus format.
How It Works
When node_exporter starts, it:
- Registers collectors - modules that know how to read specific metrics
- Starts HTTP server on port 9100
- Waits for scrape requests
When Prometheus scrapes http://localhost:9100/metrics:
- Each collector runs - reads files from /proc, /sys, etc.
- Parses the data - converts kernel format to numbers
- Generates Prometheus metrics - formats as text
- Returns via HTTP - sends to Prometheus
The Transformation
What the kernel exposes:
cat /proc/stat
# cpu 194342 7410 463657 176838 23208 0 1196 0 0 0
What node_exporter exposes:
curl http://localhost:9100/metrics | grep node_cpu_seconds_total
Output:
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="user"} 194342
node_cpu_seconds_total{cpu="0",mode="nice"} 7410
node_cpu_seconds_total{cpu="0",mode="system"} 463657
node_cpu_seconds_total{cpu="0",mode="idle"} 176838
node_cpu_seconds_total{cpu="0",mode="iowait"} 23208
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="softirq"} 1196
node_cpu_seconds_total{cpu="0",mode="steal"} 0
Notice:
- Raw numbers become labeled metrics
- One line becomes 8 separate metrics (one per mode)
- Added metadata: HELP (description) and TYPE (counter)
Metric Format
Prometheus metrics follow this format:
metric_name{label1="value1",label2="value2"} numeric_value timestamp
Example:
node_cpu_seconds_total{cpu="0",mode="idle"} 176838 1706542123000
Breaking it down:
- Metric name:
node_cpu_seconds_total - Labels:
cpu="0",mode="idle"(dimensions) - Value:
176838(CPU seconds in idle mode) - Timestamp:
1706542123000(Unix timestamp in milliseconds)
Collectors
node_exporter has many collectors, each responsible for specific metrics:
CPU Collector:
- Reads
/proc/stat - Exports
node_cpu_seconds_total
Memory Collector:
- Reads
/proc/meminfo - Exports
node_memory_*(MemTotal, MemFree, MemAvailable, etc.)
Filesystem Collector:
- Reads
/proc/mountsand filesystem stats - Exports
node_filesystem_size_bytes,node_filesystem_avail_bytes
Network Collector:
- Reads
/proc/net/dev - Exports
node_network_receive_bytes_total,node_network_transmit_bytes_total
Disk Collector:
- Reads
/proc/diskstats - Exports
node_disk_read_bytes_total,node_disk_written_bytes_total
Hardware Monitoring Collector:
- Reads
/sys/class/hwmon - Exports
node_hwmon_temp_celsius,node_hwmon_fan_rpm
You can see all enabled collectors:
curl http://localhost:9100/metrics | grep "node_exporter_build_info"
Layer 3: Prometheus Scraping
Prometheus operates on a pull model - it reaches out to targets and pulls metrics.
The Scrape Configuration
In our setup, we use file-based service discovery:
# prometheus.yml
scrape_configs:
- job_name: 'file-sd'
file_sd_configs:
- files:
- '/etc/prometheus/targets/*.yml'
refresh_interval: 30s
This tells Prometheus: "Read YAML files in the targets directory, check for changes every 30 seconds."
Target Files
# targets/core.yml
- targets:
- '192.168.68.58:9100'
labels:
job: 'raspberry-pi'
instance: 'raspberry-pi'
role: 'network-core'
- targets:
- '192.168.68.11:9100'
labels:
job: 'proxmox-host'
instance: 'proxmox-host'
role: 'hypervisor'
Each entry defines:
- Target address: Where to scrape (host:port)
- Labels: Metadata attached to all metrics from this target
The Scrape Process
Every 15 seconds (the scrape_interval), for each target:
1. Prometheus makes HTTP GET request:
GET http://192.168.68.58:9100/metrics
2. node_exporter responds with metrics:
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45
node_cpu_seconds_total{cpu="0",mode="user"} 194342.12
...
3. Prometheus parses the response:
- Extracts metric names
- Extracts labels from curly braces
- Extracts values
- Records current timestamp
4. Prometheus adds configured labels:
Original:
node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45
After adding target labels:
node_cpu_seconds_total{cpu="0",mode="idle",job="raspberry-pi",instance="raspberry-pi",role="network-core"} 176838.45
5. Prometheus stores in time-series database
Scrape Success Tracking
Prometheus automatically creates a metric for each scrape:
up{job="raspberry-pi",instance="raspberry-pi"}
Values:
1= scrape succeeded0= scrape failed (target down or unreachable)
You can check if hosts are up:
up == 1
Layer 4: Time-Series Storage
Prometheus stores metrics as time-series - sequences of values over time.
What is a Time-Series?
A time-series is identified by:
- Metric name:
node_cpu_seconds_total - Label set:
{cpu="0",mode="idle",instance="raspberry-pi"}
Every unique combination creates a separate series:
Series 1: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="idle"}
1706542123 → 176838.45
1706542138 → 176842.78
1706542153 → 176847.23
1706542168 → 176851.67
...
Series 2: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="user"}
1706542123 → 194342.12
1706542138 → 194345.89
1706542153 → 194349.56
...
Series 3: node_cpu_seconds_total{instance="proxmox-host",cpu="0",mode="idle"}
1706542123 → 583421.34
1706542138 → 583426.12
...
Storage Efficiency
With our setup:
- Scrape interval: 15 seconds
- Data points per minute: 4
- Data points per hour: 240
- Data points per day: 5,760
- Data points per month (30-day retention): 172,800
For 100 metrics across 10 hosts:
- Total series: 1,000
- Total data points/month: 172,800,000 (173 million!)
Prometheus handles this through:
Compression:
- Delta encoding (store differences, not absolute values)
- Varbit encoding (use fewer bits for small numbers)
- Typical compression: 1.3 bytes per sample
Chunking:
- Groups samples into 2-hour chunks
- Compresses entire chunks
- Stores chunks on disk
Indexing:
- Builds inverted indexes on labels
- Fast lookups: "give me all series where instance=raspberry-pi"
Layer 5: Querying with PromQL
When Grafana needs data, it queries Prometheus using PromQL.
Example Query Flow
Grafana panel query:
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
What Prometheus does:
- Parse the query - understand the structure
- Select series - find all series matching
node_cpu_seconds_total{mode="idle"} - Fetch data - get last 5 minutes of samples for those series
- Calculate irate - compute instant rate for each series
- Aggregate - average by instance
- Apply math - multiply by 100, subtract from 100
- Return result - send to Grafana
Result Format
Prometheus returns JSON:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"instance": "raspberry-pi"
},
"value": [1706542123, "5.5"]
},
{
"metric": {
"instance": "proxmox-host"
},
"value": [1706542123, "15.2"]
}
]
}
}
Each result has:
- metric: Labels identifying the series
- value:
[timestamp, value]pair
Layer 6: Grafana Visualization
Grafana takes Prometheus query results and renders them.
Dashboard Panel Configuration
A Grafana panel has:
1. Query configuration:
{
"expr": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"legendFormat": ""
}
2. Visualization settings:
- Panel type: Timeseries (line chart)
- Y-axis unit: Percent
- Min: 0, Max: 100
3. Refresh interval:
- Auto-refresh every 30 seconds
- Queries Prometheus on each refresh
The Render Process
Every 30 seconds, Grafana:
-
Sends query to Prometheus
GET http://prometheus:9090/api/v1/query_range?query=...&start=...&end=...&step=15s -
Receives time-series data
{ "raspberry-pi": [[t1, v1], [t2, v2], [t3, v3], ...], "proxmox-host": [[t1, v1], [t2, v2], [t3, v3], ...] } -
Renders the graph
- X-axis: Time
- Y-axis: CPU usage percentage
- One line per instance
-
Applies styling
- Colors from palette
- Thresholds (green < 60%, yellow < 80%, red > 80%)
- Legend with current/avg/max values
Real-World Example: CPU Usage Journey
Let's trace a single CPU usage measurement from hardware to dashboard.
T=0: Hardware Event
Your Raspberry Pi executes instructions. The CPU is busy.
T=0.01s: Kernel Accounting
The Linux kernel increments counters in /proc/stat:
Before: cpu0 ... 44209 (idle time)
After: cpu0 ... 44210 (idle time)
One more USER_HZ of idle time recorded.
T=15s: Prometheus Scrapes
Prometheus makes HTTP request:
GET http://192.168.68.58:9100/metrics
T=15.1s: node_exporter Reads /proc
# node_exporter internally does:
open("/proc/stat", O_RDONLY)
read(fd, buffer, 4096)
# buffer now contains: "cpu0 ... 44210 ..."
parse(buffer)
T=15.2s: node_exporter Returns Metrics
HTTP/1.1 200 OK
Content-Type: text/plain
node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45
node_cpu_seconds_total{cpu="0",mode="user"} 194342.12
...
T=15.3s: Prometheus Stores
Prometheus writes to time-series database:
Series: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="idle"}
Sample: (timestamp=1706542123, value=176838.45)
This gets appended to the series.
T=30s: Next Scrape
Same process, new value:
Sample: (timestamp=1706542138, value=176842.78)
Now we have two points - we can calculate a rate!
T=60s: Grafana Queries
User opens dashboard. Grafana sends query:
irate(node_cpu_seconds_total{instance="raspberry-pi",mode="idle"}[5m])
T=60.1s: Prometheus Calculates
Last two samples:
(1706542138, 176842.78)
(1706542153, 176847.23)
Difference: 176847.23 - 176842.78 = 4.45 seconds
Time span: 1706542153 - 1706542138 = 15 seconds
Rate: 4.45 / 15 = 0.2967 (29.67% idle)
T=60.2s: Grafana Renders
Dashboard shows:
- X-axis: 10:01:00
- Y-axis: 70.33% (100 - 29.67 = CPU usage)
- Line graph point plotted
You see the result on screen!
How I Create Dashboards
Now you know the full pipeline, here's my process:
Step 1: Explore Available Metrics
# See what node_exporter exposes
curl http://192.168.68.58:9100/metrics | less
I look for:
- Metric names (e.g.,
node_memory_MemTotal_bytes) - Labels (e.g.,
{instance="...", device="..."}) - Metric types (counter vs gauge)
Step 2: Test Query in Prometheus
Open http://prometheus:9090/graph and experiment:
# Start simple
node_memory_MemTotal_bytes
# Add filters
node_memory_MemTotal_bytes{instance="raspberry-pi"}
# Calculate percentage
(node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes * 100
Check the Console tab to see actual values.
Step 3: Build the Dashboard Panel
Create JSON structure:
{
"type": "timeseries",
"title": "Memory Usage",
"targets": [
{
"expr": "memory usage query here",
"legendFormat": ""
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
}
}
Step 4: Refine Styling
- Choose colors (palette-classic, thresholds)
- Set units (percent, bytes, seconds)
- Configure legend (show mean, max, current)
- Add thresholds (green < 60%, yellow < 80%, red > 80%)
Step 5: Test and Iterate
- Check with live data
- Verify all hosts appear
- Ensure values make sense
- Adjust time ranges if needed
Common Questions
Why 15-second scrape interval?
Trade-offs:
- Shorter (5s): More granular, higher storage, more CPU
- 15s: Good balance (our choice)
- Longer (60s): Less storage, might miss spikes
For most infrastructure, 15s is ideal.
Why pull model instead of push?
Prometheus pulls metrics instead of having exporters push them:
Advantages:
- Prometheus controls scrape timing (no stampedes)
- Failed scrapes are visible (
up == 0) - Targets don't need to know about Prometheus
- Easy to add/remove targets dynamically
Disadvantages:
- Targets must be reachable from Prometheus
- Short-lived jobs need special handling (Pushgateway)
How much storage does Prometheus use?
Formula:
Storage = samples/sec × retention × bytes/sample
Our setup:
- 1000 series × 4 samples/min = 4000 samples/min = 67 samples/sec
- 30 days retention
- ~1.3 bytes/sample (compressed)
67 × 30×24×60×60 × 1.3 bytes = ~226 MB
Pretty efficient!
Can Prometheus lose data?
Yes, in these scenarios:
- Scrape fails (network issue, target down)
- Prometheus crashes (recent data in memory lost)
- Disk full (new samples rejected)
Mitigations:
- High availability: Run multiple Prometheus servers
- Remote storage: Send data to long-term storage
- Local retention: Keep 30 days local, more remote
Key Takeaways
-
The kernel is the source of truth - All metrics originate from /proc and /sys
-
node_exporter is a translator - Converts kernel format to Prometheus format
-
Prometheus is the database - Stores time-series efficiently, indexed by labels
-
PromQL is the query language - Aggregates and transforms raw data into insights
-
Grafana is the visualization layer - Renders queries as graphs and gauges
-
Labels are crucial - They create dimensions for slicing data
-
Scraping is pull-based - Prometheus controls when and how often to collect
-
Time-series storage is efficient - Compression and chunking handle millions of samples
The beauty of this architecture: Each layer does one thing well. node_exporter knows hardware, Prometheus knows time-series, Grafana knows visualization. Simple, composable, powerful.
Further Exploration
To understand metrics better:
# Explore what the kernel exposes
cat /proc/stat
cat /proc/meminfo
cat /proc/diskstats
# See what node_exporter exposes
curl http://localhost:9100/metrics | less
# Check Prometheus storage
du -sh /opt/monitoring/prometheus-data
To debug scraping issues:
# Check Prometheus targets
curl http://localhost:9090/api/v1/targets
# See what Prometheus is actually storing
curl 'http://localhost:9090/api/v1/query?query=up'
To learn more about metrics:
- Read
/procdocumentation:man proc - Explore node_exporter collectors: GitHub
- Study Prometheus architecture: Docs
This guide is based on hands-on experience building production monitoring infrastructure. Every step described here is actually happening in your homelab right now.