How Prometheus Monitoring Works: From /proc to Dashboard

When you look at a Grafana dashboard showing CPU usage, memory, network traffic - have you ever wondered where that data comes from? This guide explains the complete journey from hardware to visualization.

The Complete Pipeline

Hardware generates events
    ↓
Linux kernel counts events in /proc and /sys
    ↓
node_exporter reads kernel files every scrape
    ↓
node_exporter exposes metrics on :9100/metrics
    ↓
Prometheus scrapes HTTP endpoint every 15s
    ↓
Prometheus stores time-series in database
    ↓
Grafana queries Prometheus with PromQL
    ↓
Grafana renders dashboard

Let's walk through each layer.

Layer 1: The Linux Kernel

The Linux kernel tracks everything happening on your system and exposes statistics through special filesystems.

/proc - Process and System Information

The /proc filesystem is a virtual filesystem - the files don't exist on disk, they're generated on-the-fly by the kernel.

CPU statistics:

cat /proc/stat

Output:

cpu  194342 7410 463657 176838 23208 0 1196 0 0 0
cpu0 48585 1852 115914 44209 5802 0 299 0 0 0
cpu1 48589 1853 115915 44210 5802 0 299 0 0 0
cpu2 48584 1852 115914 44209 5802 0 299 0 0 0
cpu3 48584 1853 115914 44210 5802 0 299 0 0 0

Each number represents CPU time in USER_HZ (usually 1/100th of a second):

user nice system idle iowait irq softirq steal guest guest_nice

user: Time running user processes
nice: Time running low-priority processes
system: Time running kernel code
idle: Time doing nothing
iowait: Time waiting for I/O

Memory statistics:

cat /proc/meminfo

Output:

MemTotal:       16659512 kB
MemFree:         4234152 kB
MemAvailable:   12456784 kB
Buffers:          123456 kB
Cached:          7654321 kB
SwapTotal:       8388608 kB
SwapFree:        8388608 kB
...

Every field is a different aspect of memory usage.

Network statistics:

cat /proc/net/dev

Output:

Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
  eth0: 123456789  1234567    0    0    0     0          0         0 98765432   987654    0    0    0     0       0          0
    lo: 12345678   123456     0    0    0     0          0         0 12345678   123456    0    0    0     0       0          0

Columns show bytes/packets received and transmitted per interface.

Disk statistics:

cat /proc/diskstats

Output:

   8       0 sda 12345 67890 123456 234567 45678 56789 234567 345678 0 123456 567890
   8       1 sda1 1234 6789 12345 23456 4567 5678 23456 34567 0 12345 56789

Columns include reads completed, sectors read, time reading, writes completed, etc.

/sys - Hardware Monitoring

The /sys filesystem exposes hardware information.

Temperature sensors:

cat /sys/class/hwmon/hwmon0/temp1_input

Output:

This is 45°C (temperatures are in millidegrees Celsius).

Fan speeds:

cat /sys/class/hwmon/hwmon0/fan1_input

Output:

Fan spinning at 2400 RPM.

Layer 2: node_exporter

node_exporter is a Prometheus exporter - a program that reads system statistics and exposes them in Prometheus format.

How It Works

When node_exporter starts, it:

Registers collectors - modules that know how to read specific metrics
Starts HTTP server on port 9100
Waits for scrape requests

When Prometheus scrapes http://localhost:9100/metrics:

Each collector runs - reads files from /proc, /sys, etc.
Parses the data - converts kernel format to numbers
Generates Prometheus metrics - formats as text
Returns via HTTP - sends to Prometheus

The Transformation

What the kernel exposes:

cat /proc/stat
# cpu  194342 7410 463657 176838 23208 0 1196 0 0 0

What node_exporter exposes:

curl http://localhost:9100/metrics | grep node_cpu_seconds_total

Output:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="user"} 194342
node_cpu_seconds_total{cpu="0",mode="nice"} 7410
node_cpu_seconds_total{cpu="0",mode="system"} 463657
node_cpu_seconds_total{cpu="0",mode="idle"} 176838
node_cpu_seconds_total{cpu="0",mode="iowait"} 23208
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="softirq"} 1196
node_cpu_seconds_total{cpu="0",mode="steal"} 0

Notice:

Raw numbers become labeled metrics
One line becomes 8 separate metrics (one per mode)
Added metadata: HELP (description) and TYPE (counter)

Metric Format

Prometheus metrics follow this format:

metric_name{label1="value1",label2="value2"} numeric_value timestamp

Example:

node_cpu_seconds_total{cpu="0",mode="idle"} 176838 1706542123000

Breaking it down:

Metric name: node_cpu_seconds_total
Labels: cpu="0", mode="idle" (dimensions)
Value: 176838 (CPU seconds in idle mode)
Timestamp: 1706542123000 (Unix timestamp in milliseconds)

Collectors

node_exporter has many collectors, each responsible for specific metrics:

CPU Collector:

Reads /proc/stat
Exports node_cpu_seconds_total

Memory Collector:

Reads /proc/meminfo
Exports node_memory_* (MemTotal, MemFree, MemAvailable, etc.)

Filesystem Collector:

Reads /proc/mounts and filesystem stats
Exports node_filesystem_size_bytes, node_filesystem_avail_bytes

Network Collector:

Reads /proc/net/dev
Exports node_network_receive_bytes_total, node_network_transmit_bytes_total

Disk Collector:

Reads /proc/diskstats
Exports node_disk_read_bytes_total, node_disk_written_bytes_total

Hardware Monitoring Collector:

Reads /sys/class/hwmon
Exports node_hwmon_temp_celsius, node_hwmon_fan_rpm

You can see all enabled collectors:

curl http://localhost:9100/metrics | grep "node_exporter_build_info"

Layer 3: Prometheus Scraping

Prometheus operates on a pull model - it reaches out to targets and pulls metrics.

The Scrape Configuration

In our setup, we use file-based service discovery:

# prometheus.yml
scrape_configs:
  - job_name: 'file-sd'
    file_sd_configs:
      - files:
          - '/etc/prometheus/targets/*.yml'
        refresh_interval: 30s

This tells Prometheus: "Read YAML files in the targets directory, check for changes every 30 seconds."

Target Files

# targets/core.yml
- targets:
    - '192.168.68.58:9100'
  labels:
    job: 'raspberry-pi'
    instance: 'raspberry-pi'
    role: 'network-core'

- targets:
    - '192.168.68.11:9100'
  labels:
    job: 'proxmox-host'
    instance: 'proxmox-host'
    role: 'hypervisor'

Each entry defines:

Target address: Where to scrape (host:port)
Labels: Metadata attached to all metrics from this target

The Scrape Process

Every 15 seconds (the scrape_interval), for each target:

1. Prometheus makes HTTP GET request:

GET http://192.168.68.58:9100/metrics

2. node_exporter responds with metrics:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45
node_cpu_seconds_total{cpu="0",mode="user"} 194342.12
...

3. Prometheus parses the response:

Extracts metric names
Extracts labels from curly braces
Extracts values
Records current timestamp

4. Prometheus adds configured labels:

Original:
node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45

After adding target labels:
node_cpu_seconds_total{cpu="0",mode="idle",job="raspberry-pi",instance="raspberry-pi",role="network-core"} 176838.45

5. Prometheus stores in time-series database

Scrape Success Tracking

Prometheus automatically creates a metric for each scrape:

up{job="raspberry-pi",instance="raspberry-pi"}

Values:

1 = scrape succeeded
0 = scrape failed (target down or unreachable)

You can check if hosts are up:

up == 1

Layer 4: Time-Series Storage

Prometheus stores metrics as time-series - sequences of values over time.

What is a Time-Series?

A time-series is identified by:

Metric name: node_cpu_seconds_total
Label set: {cpu="0",mode="idle",instance="raspberry-pi"}

Every unique combination creates a separate series:

Series 1: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="idle"}
  1706542123 → 176838.45
  1706542138 → 176842.78
  1706542153 → 176847.23
  1706542168 → 176851.67
  ...

Series 2: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="user"}
  1706542123 → 194342.12
  1706542138 → 194345.89
  1706542153 → 194349.56
  ...

Series 3: node_cpu_seconds_total{instance="proxmox-host",cpu="0",mode="idle"}
  1706542123 → 583421.34
  1706542138 → 583426.12
  ...

Storage Efficiency

With our setup:

Scrape interval: 15 seconds
Data points per minute: 4
Data points per hour: 240
Data points per day: 5,760
Data points per month (30-day retention): 172,800

For 100 metrics across 10 hosts:

Total series: 1,000
Total data points/month: 172,800,000 (173 million!)

Prometheus handles this through:

Compression:

Delta encoding (store differences, not absolute values)
Varbit encoding (use fewer bits for small numbers)
Typical compression: 1.3 bytes per sample

Chunking:

Groups samples into 2-hour chunks
Compresses entire chunks
Stores chunks on disk

Indexing:

Builds inverted indexes on labels
Fast lookups: "give me all series where instance=raspberry-pi"

Layer 5: Querying with PromQL

When Grafana needs data, it queries Prometheus using PromQL.

Example Query Flow

Grafana panel query:

100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

What Prometheus does:

Parse the query - understand the structure
Select series - find all series matching node_cpu_seconds_total{mode="idle"}
Fetch data - get last 5 minutes of samples for those series
Calculate irate - compute instant rate for each series
Aggregate - average by instance
Apply math - multiply by 100, subtract from 100
Return result - send to Grafana

Result Format

Prometheus returns JSON:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "instance": "raspberry-pi"
        },
        "value": [1706542123, "5.5"]
      },
      {
        "metric": {
          "instance": "proxmox-host"
        },
        "value": [1706542123, "15.2"]
      }
    ]
  }
}

Each result has:

metric: Labels identifying the series
value: [timestamp, value] pair

Layer 6: Grafana Visualization

Grafana takes Prometheus query results and renders them.

Dashboard Panel Configuration

A Grafana panel has:

1. Query configuration:

{
  "expr": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
  "legendFormat": ""
}

2. Visualization settings:

Panel type: Timeseries (line chart)
Y-axis unit: Percent
Min: 0, Max: 100

3. Refresh interval:

Auto-refresh every 30 seconds
Queries Prometheus on each refresh

The Render Process

Every 30 seconds, Grafana:

Sends query to Prometheus

GET http://prometheus:9090/api/v1/query_range?query=...&start=...&end=...&step=15s

Receives time-series data

{
  "raspberry-pi": [[t1, v1], [t2, v2], [t3, v3], ...],
  "proxmox-host": [[t1, v1], [t2, v2], [t3, v3], ...]
}

Renders the graph
- X-axis: Time
- Y-axis: CPU usage percentage
- One line per instance
Applies styling
- Colors from palette
- Thresholds (green < 60%, yellow < 80%, red > 80%)
- Legend with current/avg/max values

Real-World Example: CPU Usage Journey

Let's trace a single CPU usage measurement from hardware to dashboard.

T=0: Hardware Event

Your Raspberry Pi executes instructions. The CPU is busy.

T=0.01s: Kernel Accounting

The Linux kernel increments counters in /proc/stat:

Before: cpu0 ... 44209 (idle time)
After:  cpu0 ... 44210 (idle time)

One more USER_HZ of idle time recorded.

T=15s: Prometheus Scrapes

Prometheus makes HTTP request:

GET http://192.168.68.58:9100/metrics

T=15.1s: node_exporter Reads /proc

# node_exporter internally does:
open("/proc/stat", O_RDONLY)
read(fd, buffer, 4096)
# buffer now contains: "cpu0 ... 44210 ..."
parse(buffer)

T=15.2s: node_exporter Returns Metrics

HTTP/1.1 200 OK
Content-Type: text/plain

node_cpu_seconds_total{cpu="0",mode="idle"} 176838.45
node_cpu_seconds_total{cpu="0",mode="user"} 194342.12
...

T=15.3s: Prometheus Stores

Prometheus writes to time-series database:

Series: node_cpu_seconds_total{instance="raspberry-pi",cpu="0",mode="idle"}
Sample: (timestamp=1706542123, value=176838.45)

This gets appended to the series.

T=30s: Next Scrape

Same process, new value:

Sample: (timestamp=1706542138, value=176842.78)

Now we have two points - we can calculate a rate!

T=60s: Grafana Queries

User opens dashboard. Grafana sends query:

irate(node_cpu_seconds_total{instance="raspberry-pi",mode="idle"}[5m])

T=60.1s: Prometheus Calculates

Last two samples:
  (1706542138, 176842.78)
  (1706542153, 176847.23)

Difference: 176847.23 - 176842.78 = 4.45 seconds
Time span: 1706542153 - 1706542138 = 15 seconds

Rate: 4.45 / 15 = 0.2967 (29.67% idle)

T=60.2s: Grafana Renders

Dashboard shows:

X-axis: 10:01:00
Y-axis: 70.33% (100 - 29.67 = CPU usage)
Line graph point plotted

You see the result on screen!

How I Create Dashboards

Now you know the full pipeline, here's my process:

Step 1: Explore Available Metrics

# See what node_exporter exposes
curl http://192.168.68.58:9100/metrics | less

I look for:

Metric names (e.g., node_memory_MemTotal_bytes)
Labels (e.g., {instance="...", device="..."})
Metric types (counter vs gauge)

Step 2: Test Query in Prometheus

Open http://prometheus:9090/graph and experiment:

# Start simple
node_memory_MemTotal_bytes

# Add filters
node_memory_MemTotal_bytes{instance="raspberry-pi"}

# Calculate percentage
(node_memory_MemTotal_bytes - node_memory_MemFree_bytes) / node_memory_MemTotal_bytes * 100

Check the Console tab to see actual values.

Step 3: Build the Dashboard Panel

Create JSON structure:

{
  "type": "timeseries",
  "title": "Memory Usage",
  "targets": [
    {
      "expr": "memory usage query here",
      "legendFormat": ""
    }
  ],
  "fieldConfig": {
    "defaults": {
      "unit": "percent",
      "min": 0,
      "max": 100
    }
  }
}

Step 4: Refine Styling

Choose colors (palette-classic, thresholds)
Set units (percent, bytes, seconds)
Configure legend (show mean, max, current)
Add thresholds (green < 60%, yellow < 80%, red > 80%)

Step 5: Test and Iterate

Check with live data
Verify all hosts appear
Ensure values make sense
Adjust time ranges if needed

Common Questions

Why 15-second scrape interval?

Trade-offs:

Shorter (5s): More granular, higher storage, more CPU
15s: Good balance (our choice)
Longer (60s): Less storage, might miss spikes

For most infrastructure, 15s is ideal.

Why pull model instead of push?

Prometheus pulls metrics instead of having exporters push them:

Advantages:

Prometheus controls scrape timing (no stampedes)
Failed scrapes are visible (up == 0)
Targets don't need to know about Prometheus
Easy to add/remove targets dynamically

Disadvantages:

Targets must be reachable from Prometheus
Short-lived jobs need special handling (Pushgateway)

How much storage does Prometheus use?

Formula:

Storage = samples/sec × retention × bytes/sample

Our setup:

1000 series × 4 samples/min = 4000 samples/min = 67 samples/sec
30 days retention
~1.3 bytes/sample (compressed)

67 × 30×24×60×60 × 1.3 bytes = ~226 MB

Pretty efficient!

Can Prometheus lose data?

Yes, in these scenarios:

Scrape fails (network issue, target down)
Prometheus crashes (recent data in memory lost)
Disk full (new samples rejected)

Mitigations:

High availability: Run multiple Prometheus servers
Remote storage: Send data to long-term storage
Local retention: Keep 30 days local, more remote

Key Takeaways

The kernel is the source of truth - All metrics originate from /proc and /sys
node_exporter is a translator - Converts kernel format to Prometheus format
Prometheus is the database - Stores time-series efficiently, indexed by labels
PromQL is the query language - Aggregates and transforms raw data into insights
Grafana is the visualization layer - Renders queries as graphs and gauges
Labels are crucial - They create dimensions for slicing data
Scraping is pull-based - Prometheus controls when and how often to collect
Time-series storage is efficient - Compression and chunking handle millions of samples

The beauty of this architecture: Each layer does one thing well. node_exporter knows hardware, Prometheus knows time-series, Grafana knows visualization. Simple, composable, powerful.

Further Exploration

To understand metrics better:

# Explore what the kernel exposes
cat /proc/stat
cat /proc/meminfo
cat /proc/diskstats

# See what node_exporter exposes
curl http://localhost:9100/metrics | less

# Check Prometheus storage
du -sh /opt/monitoring/prometheus-data

To debug scraping issues:

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# See what Prometheus is actually storing
curl 'http://localhost:9090/api/v1/query?query=up'

To learn more about metrics:

Read /proc documentation: man proc
Explore node_exporter collectors: GitHub
Study Prometheus architecture: Docs

This guide is based on hands-on experience building production monitoring infrastructure. Every step described here is actually happening in your homelab right now.