Browse Observability Patterns

Observability Diagram Library and Telemetry Maps

Reusable Mermaid diagrams for telemetry flow, tracing, SLO feedback loops, alert routing, workflow observability, and governance boundaries.

This appendix collects the diagram patterns used across the guide into one reusable visual reference. Use these diagrams in architecture docs, post-incident reviews, design proposals, and onboarding notes when a team needs a fast way to explain how observability is supposed to work.

The goal is not to make every document more decorative. The goal is to make complex telemetry relationships easier to reason about before an incident exposes the missing connection.

End-To-End Telemetry Flow

Use this when explaining how raw telemetry becomes something operators can actually act on.

    flowchart LR
	    A["Application or Service"] --> B["Logs"]
	    A --> C["Metrics"]
	    A --> D["Traces"]
	    A --> E["Domain Events"]
	    B --> F["Search and Investigation"]
	    C --> G["Dashboards and SLOs"]
	    D --> H["Trace Analysis"]
	    E --> I["Workflow or Business Monitoring"]
	    G --> J["Alerts and Incident Response"]
	    H --> J
	    F --> J
	    I --> J

What to notice:

  • Different signals answer different questions.
  • Alerting and incident response usually depend on several signals, not one.
  • Storage alone is not the point; decision support is the point.

Request Path And Trace Breakdown

Use this when a team needs to see where latency or failure is accumulating across a request path.

    sequenceDiagram
	    participant User
	    participant Gateway
	    participant API
	    participant Worker
	    participant DB
	    participant Vendor
	
	    User->>Gateway: Request
	    Gateway->>API: Forward with trace context
	    API->>DB: Read account state
	    API->>Vendor: Call payment provider
	    API->>Worker: Publish async follow-up task
	    Worker->>DB: Persist result
	    API-->>Gateway: Response
	    Gateway-->>User: Response

What to notice:

  • A single user operation may span synchronous and asynchronous work.
  • Missing trace context at any boundary breaks the end-to-end picture.
  • Trace structure is useful only when spans align to meaningful dependencies.

Context Propagation Across Async Boundaries

Use this when a team understands tracing for HTTP calls but loses request identity in queues and workers.

    flowchart LR
	    A["Ingress Request"] --> B["API Service"]
	    B --> C["Message with trace_id, request_id, tenant_id"]
	    C --> D["Queue"]
	    D --> E["Worker"]
	    E --> F["Downstream Service"]
	    B --> G["Structured Logs"]
	    E --> H["Worker Logs and Spans"]
	    F --> I["Downstream Logs and Spans"]

What to notice:

  • Async observability requires metadata to travel with the message, not only with the original request.
  • Request identity, tenant identity, and operation context often have different governance needs.

Dashboard Drill-Down Layers

Use this when explaining why one dashboard is never enough.

    flowchart TD
	    A["Fleet Overview"] --> B["Service Health Dashboard"]
	    B --> C["Dependency and Saturation View"]
	    C --> D["Trace or Log Investigation"]

What to notice:

  • Each layer narrows the operational question.
  • Dashboards should support movement from symptom to evidence, not trap people in chart browsing.

SLO Feedback Loop

Use this in reliability planning, production-readiness reviews, or leadership discussions.

    flowchart LR
	    A["User Journey"] --> B["SLI"]
	    B --> C["SLO"]
	    C --> D["Error Budget"]
	    D --> E["Engineering and Release Decisions"]
	    E --> F["Instrumentation and Alert Tuning"]
	    F --> B

What to notice:

  • SLOs are not just numbers; they are policy inputs.
  • Instrumentation quality affects whether the feedback loop is trustworthy.

Alert Routing And Escalation

Use this when clarifying how detectors connect to human response.

    flowchart LR
	    A["Detector or Alert Rule"] --> B["Severity Decision"]
	    B --> C["Owning Team"]
	    B --> D["Notification Channel"]
	    C --> E["Responder Triage"]
	    E --> F["Escalation or Incident Command"]
	    F --> G["Status Updates and Recovery Work"]

What to notice:

  • Good alerting includes routing and escalation design, not only detectors.
  • Human-response clarity is part of observability design.

Workflow And Saga Observability

Use this for long-running processes where single-request visibility is not enough.

    stateDiagram-v2
	    [*] --> Accepted
	    Accepted --> Validating
	    Validating --> Approved
	    Validating --> Rejected
	    Approved --> FulfillmentStarted
	    FulfillmentStarted --> AwaitingExternalConfirmation
	    AwaitingExternalConfirmation --> Completed
	    AwaitingExternalConfirmation --> CompensationTriggered
	    CompensationTriggered --> Failed
	    Completed --> [*]
	    Rejected --> [*]
	    Failed --> [*]

What to notice:

  • Workflow observability needs state visibility, timing, and transition evidence.
  • Logs and traces help, but workflow state is often the operator’s primary model.

Data Pipeline Freshness Map

Use this when explaining analytics and data-product observability.

    flowchart LR
	    A["Source Systems"] --> B["Ingestion Jobs"]
	    B --> C["Raw Storage"]
	    C --> D["Transformations"]
	    D --> E["Curated Tables"]
	    E --> F["Dashboards and Models"]
	    B --> G["Freshness Signals"]
	    D --> H["Quality Checks"]
	    E --> I["Consumer Incident Signals"]

What to notice:

  • Pipeline observability is about freshness and semantic trust, not only runtime success.
  • Consumer-facing dashboards often show the symptom of an upstream pipeline issue.

Governance Boundary Map

Use this when ownership is unclear between service teams and the shared observability platform.

    flowchart LR
	    A["Service Teams"] --> B["Emit Signals"]
	    A --> C["Own Service Dashboards and Alerts"]
	    D["Observability Platform Team"] --> E["Schema Standards"]
	    D --> F["Storage, Routing, and Retention Defaults"]
	    D --> G["Access and Governance Controls"]
	    C --> H["Incident Responders"]
	    E --> A
	    F --> A
	    G --> H

What to notice:

  • Shared platforms need clear defaults, but service teams still own local usefulness.
  • Governance without ownership clarity creates both signal drift and incident friction.

Choosing The Right Diagram

  • Use flowchart for system boundaries, operational paths, and data movement.
  • Use sequenceDiagram when the timing and order of calls matter.
  • Use stateDiagram-v2 when long-running state transitions are central to the problem.
  • Prefer one small useful diagram over one giant diagram that nobody can maintain.
Revised on Thursday, April 23, 2026