Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Sruja

Architecture-as-code for the AI SDLC process. Define architecture in .sruja files; validate, document, and keep it in sync with your workflow. We're a tool for the lifecycle—not a diagramming product.

Why Sruja?

The Problem

Most architecture tools make you choose:

  • Visual-only tools (Draw.io) – no code, no version control, hard to maintain
  • Code-only tools (Mermaid, PlantUML) – no visual editing, steep learning curve
  • One-way sync (Structurizr) – code → view only, no visual editing
  • Stale diagrams – architecture drifts from reality, documentation gets outdated

Our Solution

Sruja gives you both visual editing and code-backed architecture:

FeatureWhat you get
Bidirectional syncEdit visually → code updates instantly; edit code → diagrams update automatically
Version-controlled.sruja files in Git, with proper code review workflows
Team-friendlyDesigners, developers, PMs all work in their preferred way
Built-in validationCatch architecture issues before they reach production
Multiple exportsJSON, Markdown, Mermaid – integrate into your existing toolchain

Who It's For

  • Engineering teams who need architecture as part of their SDLC
  • Tech leads who want to enforce architectural standards
  • Platform engineers building guardrails for distributed teams
  • AI agents that need to reason about system architecture

How We Work

  1. Define your architecture in .sruja files (or use the visual designer)
  2. Validate with built-in checks (cycles, orphans, unique IDs)
  3. Export to JSON, Markdown, or diagrams
  4. Integrate into CI/CD, docs, and your IDE workflow

We're ultra simple – minimal surface area, no unnecessary apps or frameworks – and highly functional – what we ship works reliably for its scope.

Stack

  • Rust – CLI, engine, LSP, WASM (single language for core)
  • VS Code extension – Edit .sruja, diagnostics, optional diagram preview
  • Docs – This book (mdBook, Rust-based; no TypeScript/Node)

New here? Do Quick start (about 5 min), then the Beginner path (2–3 hours).

See Quick start to install the CLI and create your first .sruja file. For a single entry point to docs, tutorials, and courses, use Navigate. The left sidebar lists everything; press / or S to search.

Sruja "Show diagram" in code blocks: Run make wasm from the repo root once, then run make book-serve (or ./serve.sh from the book directory) so the WASM files are copied into the book output.

Navigate

New here? Do Quick start first (about 5 min), then the Beginner path (2–3 hours). Everything else is below.

Use one entry point below. The sidebar always lists the full structure.

WhereLinkWhat you get
DocumentationDocs →Concepts, reference, how Sruja works, adoption guides
TutorialsTutorials →Step-by-step: CLI, DSL, validation, export, CI/CD, more
CoursesCourses →Structured courses: systems thinking, system design, ecommerce, production, AI

Quick start

Install CLI

curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash

Or build from source (requires Rust):

git clone https://github.com/sruja-ai/sruja.git && cd sruja
make build

Create a .sruja file

This is the minimal style (explicit kinds, no import). The full Getting started (full) uses import { * } from 'sruja.ai/stdlib' for less boilerplate — both work.

person = kind "Person"
system = kind "System"
container = kind "Container"

user = person "User" {}
app = system "My App" {
  web = container "Web Server" { technology "Node.js" }
}
user -> app.web "visits"

Validate and export

sruja lint example.sruja
sruja export json example.sruja
sruja export markdown example.sruja

VS Code

Install the Sruja extension for syntax, diagnostics, and optional diagram preview in the editor.


Next: Beginner path builds on this in 7 steps (2–3 hours). For a longer "first architecture" walkthrough with a view and stdlib import, see Getting started (full).

VS Code extension

The Sruja VS Code extension provides:

  • Syntax highlighting for .sruja files
  • Diagnostics (errors, warnings) via the language server
  • WASM-based diagram preview – render diagrams from your DSL in the editor (no web server)

Install from the VS Code Marketplace (search for "Sruja") or build from source in extension/.

Introduction


title: "Introduction" weight: 0

Introduction

Sruja is an open source architecture-as-code tool for the AI SDLC process.

New here? Do Quick start (about 5 min), then the Beginner path (2–3 hours). It helps teams define, validate, and evolve software architecture in code—so architecture stays in sync with design, review, CI, and docs. We are not a diagramming tool; diagrams are one output, not the product.

Why Sruja?

Most teams document architecture in static diagrams (Miro, LucidChart, Visio) or inconsistent Wiki pages. These suffer from:

  1. Drift: The code changes, but the diagram doesn't.
  2. Inconsistency: Every architect draws "boxes and arrows" differently.
  3. No Validation: You can't "test" a PNG image for broken dependencies.

Sruja treats Architecture like Code:

  • Version Control: Commit your architecture to Git.
  • Validation: CI/CD checks for circular dependencies and rule violations.
  • Consistency: Based on the C4 Model for clear, hierarchical abstractions. (See Glossary for definitions of key terms.)

Who is Sruja For?

Students & Learners

  • Learn system design with production-ready examples from fintech, healthcare, and e-commerce
  • Hands-on courses covering fundamentals to advanced patterns
  • Real-world scenarios that prepare you for interviews and real projects

Software Architects

  • Enforce architectural standards with policy-as-code
  • Prevent architectural drift through automated validation
  • Scale governance across multiple teams without manual reviews
  • Document decisions with ADRs (Architecture Decision Records)

Product Teams

  • Link requirements to architecture - see how features map to technical components
  • Track SLOs and metrics alongside your architecture
  • Align technical decisions with business goals and user needs
  • Communicate architecture to stakeholders (export to Markdown/Mermaid when needed)

DevOps Engineers

  • Integrate into CI/CD - validate architecture on every commit
  • Automate documentation generation from architecture files
  • Model deployments - Blue/Green, Canary, multi-region strategies
  • Track infrastructure - map logical architecture to physical deployment

Example

Here's a simple example to get you started:

import { * } from 'sruja.ai/stdlib'

App = system "My App" {
    Web = container "Web Server"
    DB = database "Database"
}

User = person "User"

User -> App.Web "Visits"
App.Web -> App.DB "Reads/Writes"

view index {
    include *
}

For production-ready examples with real-world patterns, see our Examples page featuring:

  • Banking systems (fintech)
  • E-commerce platforms
  • Healthcare platforms (HIPAA-compliant)
  • Multi-tenant SaaS platforms

Next Steps

Getting started (full)


title: "Getting Started" weight: 1 summary: "From zero to architecture in 5 minutes. Install Sruja and deploy your first diagram." difficulty: "beginner" estimatedTime: "5 minutes"

Your First Architecture

Welcome to the future of system design.

Sruja allows you to define your software architecture as code. No more dragging boxes around. No more outdated PNGs on a wiki. You write code, Sruja draws the maps.

1. Installation

Install the Sruja CLI to compile, validate, and export your diagrams.

Mac / Linux

curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash

From Source (Rust)

git clone https://github.com/sruja-ai/sruja.git && cd sruja
make build

Verify installation:

sruja --version
# Should output something like: sruja version v0.2.0

2. Hello, World!

Let's model a simple web application. Create a file named hello.sruja.

This page uses stdlib import (same style as the rest of the book). The Quick start uses explicit kinds (person = kind "Person", etc.) with no import — both are valid; use whichever you prefer.

The Code

Copy and paste this into your file:

// hello.sruja
import { * } from 'sruja.ai/stdlib'

// 1. Define the System
webApp = system "My Cool Startup" {
    description "The next big thing."

    frontend = container "React App"
    api = container "Rust Service"
    db = database "PostgreSQL"

    // 2. Define Connections
    frontend -> api "Requests Data"
    api -> db "Reads/Writes"
}

// 3. Define Users
user = person "Early Adopter"

// 4. Connect User to System
user -> webApp.frontend "Visits Website"

view index {
    include *
}

Tip

Using stdlib imports: The import { * } from 'sruja.ai/stdlib' line provides all standard kinds (person, system, container, database, etc.) so you don't declare them manually. For the minimal style with explicit kinds, see Quick start.

3. Generate the Diagram

Run this command in your terminal:

sruja export mermaid hello.sruja > diagram.mmd

You have just created a Diagram-as-Code artifact! You can paste the content of diagram.mmd into Mermaid Live Editor to see it, or use the VS Code extension to preview it instantly.

What you'll get: A beautiful C4 diagram showing:

  • The user (person) on the left
  • Your system with containers (Web, API, DB) in the middle
  • Relationships (arrows) showing how they connect

Tip

Want to see it visually?


4. Understanding the Basics

Let's break down what just happened.

  1. Import Standard Kinds: The import { * } from 'sruja.ai/stdlib' line gives you all standard element types (person, system, container, database, etc.) without needing to declare them manually.
  2. Element Creation: You create elements using the = operator (e.g., webApp = system "My Cool Startup").
  3. Nested Elements: Containers and components can be defined inside a system block { ... }, or referred to using dot notation (e.g., webApp.frontend).
  4. Relationships: The -> operator defines how things connect. Relationships can be defined anywhere in the file.
  5. Views: The view index { include * } block tells Sruja to generate a diagram showing all elements. Sruja automatically generates C4 diagrams for you.
  6. Flat Syntax: Sruja uses a flat syntax where all declarations are top-level. There are no wrapper blocks required.

Note

Custom Kinds: If you need custom element types (like microservice or serverless), you can declare them manually: microservice = kind "Microservice". For most use cases, the standard kinds from stdlib are sufficient.


What Now?

You have the tools. Now get the skills.

  • 🎓 Learn the Core: Take the System Design 101 course to move beyond "Hello World".
  • 🏗 See Real Patterns: Copy production-ready code from Examples.
  • 🛠 Master the CLI: Learn how to validate constraints in CLI Basics.

How Sruja works


title: "How Sruja Works" weight: 3

How Sruja Works

Sruja is built to be a tool for the AI SDLC process: architecture in code that fits into your lifecycle—IDE, CI/CD, and documentation. We are not a diagramming product; we provide parse, validate, export, and optional preview.

The Sruja Platform

The platform consists of several key components working together:

  1. Parser & engine: Rust crates for parsing, validation, and export (sruja-language, sruja-engine, sruja-export).
  2. CLI: Command-line interface for local development and CI/CD (sruja-cli).
  3. WASM: Rust core compiled to WebAssembly for the docs book and VS Code (sruja-wasm).
  4. LSP: Language server for VS Code (sruja-lsp).
  5. Docs: This site—built with mdBook from the book/ directory.

Architecture Diagram

Explore the Sruja architecture itself using the interactive viewer below. This diagram is defined in Sruja DSL!

import { * } from 'sruja.ai/stdlib'


RootSystem = system "The Sruja Platform" {
  tags ["root"]
}

User = person "Architect/Developer" {
	description "Uses Sruja to design and document systems"
}

Sruja = system "Sruja Platform" {
	description "Tools for defining, visualizing, and analyzing software architecture"

	CLI = container "Sruja CLI" {
		technology "Rust"
		description "Command-line interface (crates/sruja-cli)"
	}

	Engine = container "Core Engine" {
		technology "Rust"
		description "Validation and export (crates/sruja-engine, sruja-export)"

		Validation = component "Validation Engine" {
			technology "Rust"
			description "Validates AST against rules (crates/sruja-core/src/engine/rules)"
		}

		Scorer = component "Scoring Engine" {
			technology "Rust"
			description "Calculates architecture health score (crates/sruja-core/src/engine/scorer)"
		}

		Policy = component "Policy Engine" {
			technology "Rust"
			description "Enforces custom policies (future: OPA/Rego)"
		}

		Scorer -> Validation "uses results from"
		Validation -> Policy "checks against"
	}

	Language = container "Language Service" {
		technology "Rust"
		description "Parser and AST (crates/sruja-language); LSP (crates/sruja-lsp)"
	}

	WASM = container "WASM Module" {
		technology "Rust/WASM"
		description "WebAssembly build (crates/sruja-wasm)"
	}

	VSCode = container "VS Code Extension" {
		technology "TypeScript"
		description "Editor extension (extension/)"
	}

	Book = container "Documentation" {
		technology "mdBook"
		description "This site (book/)"
	}

	// Internal Dependencies
	CLI -> Language "parses DSL using"
	CLI -> Engine "validates using"
	CLI -> WASM "builds"

	WASM -> Language "embeds"
	WASM -> Engine "embeds"

	VSCode -> Language "uses LSP"
	VSCode -> WASM "uses for LSP and preview"

	Book -> WASM "uses for diagram blocks"
}

User -> Sruja.CLI "runs commands"
User -> Sruja.VSCode "writes DSL"
User -> Sruja.Book "reads docs"

BrowserSystem = system "Web Browser" {
	description "User's web browser environment"
  tags ["external"]
	LocalStore = database "Local Storage"
}

// ADRs
ADR001 = adr "Use WASM for Client-Side Execution" {
	status "Accepted"
	context "We need to run validation and parsing in the browser and VS Code without a backend server."
	decision "Compile the Rust core engine to WebAssembly."
	consequences "Ensures consistent logic across all platforms but increases build complexity."
}

// Deployment
deployment Production "Production Environment" {
  node GitHubPages "GitHub Pages" {
    containerInstance RootSystem
  }
}

GitHubSystem = system "GitHub Platform" {
  description "Source control, CI/CD, and hosting"
  Actions = container "GitHub Actions" {
    technology "YAML/Node"
    description "CI/CD workflows"
  }
  Pages = container "GitHub Pages" {
    technology "Static Hosting"
    description "Hosts documentation site"
  }
  Releases = container "GitHub Releases" {
    technology "File Hosting"
    description "Hosts CLI binaries"
  }
  Actions -> Pages "deploys to"
  Actions -> Releases "publishes to"
}


User -> GitHubSystem "pushes code to"


// Component Stories
CLIStory = story "Using the CLI" {
  User -> Sruja.CLI "runs validate"
  Sruja.CLI -> Sruja.Language "parses DSL"
  Sruja.CLI -> Sruja.Engine "validates"
  Sruja.CLI -> User "reports diagnostics"
}

VSCodeStory = story "Using VS Code" {
  User -> Sruja.VSCode "edits .sruja file"
  Sruja.VSCode -> Sruja.WASM "LSP and preview"
  Sruja.WASM -> Sruja.VSCode "diagnostics and diagram"
  Sruja.VSCode -> User "shows errors and preview"
}

CIDev = scenario "Continuous Integration (Dev)" {
  User -> GitHubSystem "pushes to main"
  GitHubSystem -> GitHubSystem.Actions "triggers CI"
  GitHubSystem.Actions -> Sruja "builds & tests"
  GitHubSystem.Actions -> GitHubSystem.Pages "deploys dev site"
}

ReleaseProd = scenario "Production Release" {
  User -> GitHubSystem "merges PR to prod"
  GitHubSystem -> GitHubSystem.Actions "triggers release"
  GitHubSystem.Actions -> GitHubSystem.Pages "deploys prod site"
  GitHubSystem.Actions -> Sruja.VSCode "publishes extension"
  GitHubSystem.Actions -> GitHubSystem.Releases "publishes CLI binaries"
}

view index {
  title "Complete System View"
  include *
}

Key Components

Core Engine (Rust)

The sruja-language and sruja-engine crates form the foundation. They define the DSL grammar, parse input files into an AST (Abstract Syntax Tree), and run validation rules (like cycle detection and layer enforcement).

WebAssembly (WASM)

The Rust core is compiled to WebAssembly (sruja-wasm). The same parsing and validation logic runs in:

  • VS Code Extension: For local preview without needing a CLI binary.
  • Documentation site: For "Show diagram" in code blocks (like the one above).

CLI & CI/CD

The sruja CLI (sruja-cli) is a static binary that wraps the core engine. It supports:

  • Local development: sruja fmt, sruja lint, sruja export.
  • CI/CD: Validate and export architecture in pipelines.
  • Export: sruja export json, sruja export markdown, sruja export dot, and more.

Examples


title: "Usage Examples" weight: 80 summary: "Production-ready architectures. Learn how to model Fintech, E-Commerce, and SaaS systems with Sruja."

Examples & Patterns

Theory is good, but code is better. Below are production-grade Sruja models that you can copy, paste, and adapt.

Every example here follows our "FAANG-level" quality standards:

  1. Clear Requirements: Functional & Non-functional.
  2. Proper Hierarchies: Context -> Container -> Component.
  3. Real Tech Stacks: No generic "Database" boxes.

1. Banking System (Fintech)

Note

Ideally Suited For: Highly regulated industries requiring audit trails, security policies, and strict latency SLAs.

Scenario: A regional bank needs to modernize its legacy mainframe interactions while providing a slick mobile experience.

Why review this example?

  • Security: Uses policy blocks for PCI-DSS.
  • Hybrid Cloud: Connects modern Cloud Containers to an on-premise "Mainframe" System.
  • Complexity: Models the "Legacy Core" vs "Modern Interface" pattern often seen in enterprise.
import { * } from 'sruja.ai/stdlib'


// --- REQUIREMENTS ---
// We start with the 'Why'. These drive the architecture.
R1 = requirement functional "Customers must be able to view balances"
R2 = requirement functional "Customers can transfer money internally"
R3 = requirement security "All PII must be encrypted at rest (PCI-DSS)"
R4 = requirement stability "99.99% Availability (Target: <52m downtime/year)"

// --- ACTORS ---
Customer = person "Banking Customer" {
    description "A holder of one or more accounts"
}

// --- SYSTEMS ---
BankingSystem = system "Internet Banking Platform" {
    description "Allows customers to view information and make payments."

    // Containers (Deployable units)
    WebApp = container "Single Page App" {
        technology "React / TypeScript"
    }

    MobileApp = container "Mobile App" {
        technology "Flutter"
    }

    API = container "Main API Gateway" {
        technology "Java / Spring Boot"
        description "Orchestrates calls to core services"
    }

    Database = container "Main RDBMS" {
        technology "PostgreSQL"
        tags ["database", "storage"]
    }

    // Relationships
    WebApp -> API "Uses (JSON/HTTPS)"
    MobileApp -> API "Uses (JSON/HTTPS)"
    API -> Database "Reads/Writes (JDBC)"
}

// --- EXTERNAL SYSTEMS ---
Mainframe = system "Legacy Core Banking" {
    tags ["external"] // This is outside our scope of control
    description "The heavy iron that stores the actual money."
}

EmailSystem = system "Email Service" {
    tags ["external"]
    description "SendGrid / AWS SES"
}

// --- INTEGRATIONS ---
Customer -> BankingSystem.WebApp "Views dashboard"
BankingSystem.API -> Mainframe "Syncs transactions (XML/SOAP)"
BankingSystem.API -> EmailSystem "Sends alerts"

view index {
include *
}

👉 Deep Dive this Architecture using our Course


2. Global E-Commerce Platform

Note

Ideally Suited For: High-scale B2C applications. Focuses on caching, asynchronous processing, and eventual consistency.

Scenario: An Amazon-like store preparing for Black Friday traffic spikes.

Why review this example?

  • Scalability: Explains how to handle high reads (Product Catalog) vs transactional writes (Checkout).
  • Async Messaging: Shows usages of Queues/Topics (Apache Kafka) to decouple services.
  • Caching: Strategic placement of Redis caches.
import { * } from 'sruja.ai/stdlib'


R1 = requirement scale "Handle 100k concurrent users"
R2 = requirement performance "Product pages load in <100ms"

ShopScale = system "E-Commerce Platform" {

    // --- EDGE LAYER ---
    CDN = container "Content Delivery Network" {
        technology "Cloudflare"
        description "Caches static assets and product images"
    }

    LoadBalancer = container "Load Balancer" {
        technology "NGINX"
    }

    // --- SERVICE LAYER ---
    Storefront = container "Storefront Service" {
        technology "Node.js"
        description "SSR for SEO-friendly product pages"
    }

    Checkout = container "Checkout Service" {
        technology "Rust"
        description "Handles payments and inventory locking"
    }

    // --- DATA LAYER ---
    ProductCache = container "Product Cache" {
        technology "Redis Cluster"
        description "Stores hot product data"
    }

    MainDB = database "Product Database" {
        technology "MongoDB"
        description "Flexible schema for diverse product attributes"
    }

    OrderQueue = queue "Order Events" {
        technology "Kafka"
        description "Async order processing pipeline"
    }

    // --- FLOWS ---
    CDN -> LoadBalancer "Forwards dynamic requests"
    LoadBalancer -> Storefront "Routes traffic"
    Storefront -> ProductCache "Read-through cache"
    Storefront -> MainDB "Cache miss / heavy query"

    // The Checkout Flow
    Checkout -> OrderQueue "Publishes 'OrderCreated'"
}

view index {
include *
}

What Next?

Beginner path


title: "Beginner Path" weight: 2 summary: "A 7‑step plan to learn Sruja without overwhelm." difficulty: "beginner" estimatedTime: "~2–3 hours total"

Beginner Path: Learn Sruja Without Overwhelm

If you just did Quick start, you already have your first diagram — the 7 steps below build on that and take about 2–3 hours total. Each step has a clear outcome, takes 10–30 minutes, and gives immediate feedback.

Tip

Track your progress: Check off each step as you complete it. This path takes approximately 2–3 hours total.

Step 1: Getting Started ⏱️ 20–30 min

  • Do: Quick start (minimal style). Optional: Getting started (full) (stdlib import).
  • Outcome: CLI installed, first model created
  • What you'll do: Install Sruja CLI and create your first architecture file
  • Success check: You can run sruja lint and sruja export mermaid on your file and get a diagram

Step 2: CLI Basics ⏱️ 20 min

  • Tutorial: CLI basics
  • Outcome: Run lint, fmt, tree, export commands confidently
  • What you'll do: Learn essential CLI commands for working with Sruja files
  • Success check: You can validate, format, and export your architecture

Step 3: DSL Basics ⏱️ 25–30 min

  • Tutorial: DSL basics
  • Quiz: Optional; add exercises in the book if desired
  • Outcome: Understand systems, containers, components, relations
  • What you'll do: Learn the core DSL syntax and concepts
  • Success check: You can read and write basic Sruja DSL code

Step 4: Validation & Linting ⏱️ 15–20 min

  • Tutorial: Validation & linting
  • Challenge: Missing relations
  • Outcome: Fix common modeling errors
  • What you'll do: Learn how Sruja validates your architecture and catches errors
  • Success check: You can identify and fix validation errors

Step 5: Export Diagrams ⏱️ 15–20 min

  • Tutorial: Export diagrams
  • Outcome: Generate diagrams (D2, SVG) and Markdown
  • What you'll do: Learn to export your architecture in different formats
  • Success check: You can export diagrams in multiple formats

Step 6: Practice Micro‑Challenges ⏱️ 20–30 min

  • Challenge: Add component
  • Challenge: Fix relations
  • Challenge: Queue worker
  • Outcome: Build confidence with small tasks
  • What you'll do: Practice with real-world scenarios
  • Success check: You can complete challenges without looking at solutions

Step 7: Systems Thinking ⏱️ 20 min

  • Tutorial: Systems thinking
  • Outcome: Think in flows, dependencies, and constraints
  • What you'll do: Learn to model complex systems and relationships
  • Success check: You can model a multi-system architecture

Tips to Avoid Overwhelm

  • Small steps: limit sessions to 20–30 minutes
  • Visible outcomes: run a command or export a diagram every step
  • One concept at a time: model, then lint, then export
  • Use checklists: follow the repo docs style when writing
  • Ask for help: Discord and Discussions links in README

What’s Next


Note: Sruja is free and open source (Apache 2.0 licensed). Join the community on Discord or GitHub Discussions for help and to contribute.

Concepts


title: "Overview" weight: 12 summary: "Summarize systems with high‑level context for readers and diagrams."

Overview

Use overview to provide a concise system description shown in docs/exports.

Syntax

import { * } from 'sruja.ai/stdlib'


overview {
title "E‑Commerce Platform"
summary "Web, API, and DB supporting browse, cart, and checkout"
}

view index {
include *
}

Guidance

  • Keep summary short and practical; avoid marketing language.
  • Use overview at architecture root; prefer description inside elements for details.

Architecture


title: "Architecture" weight: 10 summary: "The architecture block is the root element of any Sruja model."

Architecture

The architecture block is the root element of a Sruja model. It represents the entire scope of what you are modeling.

Syntax

import { * } from 'sruja.ai/stdlib'


// ... define systems, persons, etc. here

view index {
include *
}

Minimal Example

For simple examples, you can use a minimal structure:

import { * } from 'sruja.ai/stdlib'


MySystem = system "My System"
User = person "User"

Purpose

  • Scope Boundary: Everything inside is part of the model.
  • Naming: Gives a name to the overall architecture.

C4 model


title: "The C4 Model" weight: 1 summary: "Understand the core concepts behind Sruja's architecture modeling."

The C4 Model

Sruja is built on the C4 model, a hierarchical approach to software architecture diagrams. If you are new to architecture-as-code, it helps to understand these four levels of abstraction.

Think of it like Google Maps for your code: you can zoom out to see the whole world (System Context), or zoom in to see individual streets (Code).

The 4 Levels

1. System Context (Level 1)

"The Big Picture"

This is the highest level of abstraction. It shows your software system as a single box, and how it interacts with users and other systems (like functional dependencies, email systems, or payment gateways).

  • Goal: What is the system, who uses it, and how does it fit into the existing IT landscape?
  • Audience: Everyone (Technical & Non-Technical).
import { * } from 'sruja.ai/stdlib'


App = system "My App"
User = person "Customer"
Stripe = system "Payment Gateway"

User -> App "Uses"
App -> Stripe "Process Payments"

2. Container (Level 2)

"The High-Level Technical Building Blocks"

Note: In C4, a "Container" is NOT a Docker container. It represents a deployable unit—something that runs separately. Examples include:

  • A Single-Page Application (SPA)

  • A Mobile App

  • A Server-side API application

  • A Database

  • A File System

  • Goal: What are the major technical choices? How do they communicate?

  • Audience: Architects, Developers, Ops.

import { * } from 'sruja.ai/stdlib'


App = system "My App" {
    Web = container "React App"
    API = container "Rust Service"
    DB = database "PostgreSQL"
}

3. Component (Level 3)

"The Internals"

Zooming into a Container to see the major structural building blocks. In an API, these might be your controllers, services, or repositories.

  • Goal: How is the container structured?
  • Audience: Developers.

4. Code (Level 4)

"The Details"

The actual classes, interfaces, and functions. Sruja focuses mainly on Levels 1, 2, and 3, as Level 4 is best managed by your IDE.

Key Relationships

The power of C4 is in the Hierarchical nature.

  • A System defines the boundary.
  • Containers live inside a System.
  • Components live inside a Container.

When you define a relationship at a lower level (e.g., API -> DB), Sruja automatically understands the relationship at higher levels (e.g., App -> DB is implied).

Why use C4?

  1. Shared Vocabulary: "Component" and "Service" often mean different things to different teams. C4 standardizes this.
  2. Zoom Levels: Avoids the "one giant messy diagram" problem. You can view the system at the level of detail relevant to you.

System


title: "System" weight: 11 summary: "A System represents a software system, the highest level of abstraction in the C4 model."

System

A System represents a software system, which is the highest level of abstraction in the C4 model. A system delivers value to its users, whether they are human or other systems.

Syntax

import { * } from 'sruja.ai/stdlib'


ID = system "Label/Name" {
description "Optional description"

// Link to ADRs
adr ADR001

// ... contains containers
}

Example

import { * } from 'sruja.ai/stdlib'


BankingSystem = system "Internet Banking System" {
description "Allows customers to view accounts and make payments."
}

Container


title: "Container" weight: 14 summary: "A Container represents an application or a data store."

Container

A Container represents an application or a data store. It is something that needs to be running in order for the overall software system to work.

Note: In C4, "Container" does not mean a Docker container. It means a deployable unit like:

  • Server-side web application (e.g., Java Spring, ASP.NET Core)
  • Client-side web application (e.g., React, Angular)
  • Mobile app
  • Database schema
  • File system

Syntax

import { * } from 'sruja.ai/stdlib'


ID = container "Label/Name" {
technology "Technology Stack"
tags ["tag1", "tag2"]
// ... contains components
}

Example

import { * } from 'sruja.ai/stdlib'


BankingSystem = system "Internet Banking System" {
WebApp = container "Web Application" {
  technology "Java and Spring MVC"
  tags ["web", "frontend"]
}
}

Scaling Configuration

Containers can define horizontal scaling properties using the scale block:

import { * } from 'sruja.ai/stdlib'


API = container "API Service" {
technology "Rust, Axum"
scale {
  min 3
  max 10
  metric "cpu > 80%"
}
}

Scale Block Fields

  • min (optional): Minimum number of replicas
  • max (optional): Maximum number of replicas
  • metric (optional): Scaling metric trigger (e.g., "cpu > 80%", "memory > 90%")

This helps document your auto-scaling strategy and can be used by deployment tools.

Component


Component

A Component is a grouping of related functionality encapsulated behind a well-defined interface. Components reside inside Containers.

Syntax

import { * } from 'sruja.ai/stdlib'


ID = component "Label/Name" {
technology "Technology"
// ... items
}

Example

import { * } from 'sruja.ai/stdlib'


AuthController = component "Authentication Controller" {
technology "Spring MVC Rest Controller"
description "Handles user login and registration."
}

Person


title: "Person" weight: 15 summary: "A Person represents a human user of your software system."

Person

A Person represents a human user of your software system (e.g., "Customer", "Admin", "Employee").

Syntax

import { * } from 'sruja.ai/stdlib'


ID = person "Label" {
description "Optional description"
tags ["tag1", "tag2"]
}

Example

import { * } from 'sruja.ai/stdlib'


Customer = person "Bank Customer" {
description "A customer of the bank with personal accounts."
}

Relations


title: "Relations" weight: 20 summary: "Relations describe how elements interact with each other."

Relations

Relations describe how elements interact with each other. They are the lines connecting the boxes in your diagram.

Syntax

import { * } from 'sruja.ai/stdlib'


// Relations use element IDs
Source -> Destination "Label"
// When referring to nested elements, use fully qualified names:
System.Container -> System.Container.Component "Label"

Or with a technology/protocol:

Source -> Destination "Label" {
technology "HTTPS/JSON"
}

Example

import { * } from 'sruja.ai/stdlib'


BankingSystem = system "Internet Banking System" {
WebApp = container "Web Application"
DB = database "Database"
}

User = person "User"

User -> BankingSystem.WebApp "Visits"
BankingSystem.WebApp -> BankingSystem.DB "Reads Data"

Use clear, unique IDs to reference relation endpoints.

See Also

Views


title: "Views" weight: 36 summary: "Create focused visualizations using includes, excludes, and per‑view styles."

Views

Define views to customize what elements appear and how they render.

Syntax

person = kind "Person"
system = kind "System"
container = kind "Container"
database = kind "Database"

App = system "Application" {
  WebApp = container "Web App"
  API = container "API"
  DB = database "Database"
}

User = person "User"

User -> App.WebApp "Uses"
App.WebApp -> App.API "Calls"
App.API -> App.DB "Reads/Writes"

view api_focus of App {
  title "API Focus"
  include App.API App.DB
  exclude App.WebApp
}

styles {
  element "Database" { shape "cylinder" color "#3b82f6" }
  relationship "Calls" { color "#ef4444" }
}

view index {
  include *
}

Guidance

  • Use include to spotlight critical paths; use exclude to reduce noise.
  • Keep view names descriptive (e.g., "API Focus", "Data Flow").
  • Use view styles for legibility: color important relations, reshape data stores.
  • relations for edges
  • style block for global defaults

Validation


title: "Validation" weight: 35 summary: "Automatic checks: IDs, references, cycles, layering, externals."

Validation

Sruja validates your model to catch issues early.

Common Checks

  • Unique IDs within scope
  • Valid references (relations connect existing elements)
  • Cycles (informational; feedback loops are valid)
  • Layering violations (dependencies must flow downward)
  • External boundary checks
  • Simplicity guidance (non‑blocking)

Example

import { * } from 'sruja.ai/stdlib'


User = person "User"
App = system "App" {
  WebApp = container "Web App"
  API = container "API"
  DB = database "Database"
}

// Valid relations (qualified cross-scope)
User -> App.WebApp "Uses"
App.WebApp -> App.API "Calls"
App.API -> App.DB "Reads/Writes"

view index {
include *
}

Run sruja validate locally or in CI to enforce these rules.

See Also

Deployment


title: "Deployment" weight: 33 summary: "The Deployment view allows you to map your software containers to infrastructure."

Deployment

The Deployment view allows you to map your software containers to infrastructure. This corresponds to the C4 Deployment Diagram.

Deployment Node

A Deployment Node is something like physical hardware, a virtual machine, a Docker container, a Kubernetes pod, etc. Nodes can be nested.

Syntax

deployment "Environment" {
    node "Node Name" {
        // ...
    }
}

Infrastructure Node

An Infrastructure Node represents infrastructure software that isn't one of your containers (e.g., DNS, Load Balancer, External Database Service).

Syntax

node "App Server" {
    containerInstance WebApp
}

Container Instance

A Container Instance represents a runtime instance of one of your defined Containers running on a Deployment Node.

Syntax

containerInstance ContainerID {
    instanceId 1 // Optional
}

Example

deployment "Production" {
    node "AWS" {
        node "US-East-1" {
            node "App Server" {
                containerInstance WebApp
            }
            node "Database Server" {
                containerInstance DB
            }
        }
    }
}

Requirements


title: "Requirements" weight: 31 summary: "Model functional and non‑functional requirements directly in Sruja DSL."

Requirements

Use requirement to capture functional, performance, security, and constraint requirements. Requirements are declared at the architecture root only.

Syntax

import { * } from 'sruja.ai/stdlib'


// Requirements using flat syntax
R1 = requirement functional "Support 10k concurrent users"
R2 = requirement performance "p95 < 200ms for /checkout"
R3 = requirement security "PII encrypted at rest"
R4 = requirement constraint "Only PostgreSQL managed service"
R5 = requirement nonfunctional "System must be maintainable"

view index {
  include *
}

Guidance

  • Keep requirement titles concise and testable.
  • Reference requirements in ADRs and scenarios where relevant.
  • Validate with sruja lint to surface unmet or conflicting requirements.
  • Declarations at system/container/component level are deprecated and ignored by exporters and UI.
  • scenario for behavior walkthroughs
  • slo for targets and windows
  • adr for decision records

Scenario


title: "Scenario" weight: 22 summary: "Describe behavioral flows as steps between elements."

Scenario

Scenarios describe behavioral flows as ordered steps. They focus on interactions rather than data pipelines.

Syntax

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  DB = database "Database"
}

// Scenarios using flat syntax
CheckoutFlow = scenario "User Checkout" {
  step Customer -> Shop.WebApp "adds items to cart"
  step Shop.WebApp -> Shop.API "submits cart"
  step Shop.API -> Shop.DB "validates and reserves stock"
  step Shop.API -> Shop.WebApp "returns confirmation"
  step Shop.WebApp -> Customer "displays success"
}

// 'story' is an alias for 'scenario'
LoginStory = story "User Login" {
  step Customer -> Shop.WebApp "enters credentials"
  step Shop.WebApp -> Shop.API "validates user"
}

view index {
  include *
}

Aliases & Semantics

Sruja provides three keywords that are structurally identical (sharing the same underlying AST definition and syntax) but convey different semantic intent:

  • scenario: Models behavioral flows (e.g., Use Cases, User Journeys).
  • story: An alias for scenario (e.g., User Stories).
  • flow: Models data movement (e.g., Data Flow Diagrams).

While the syntax is the same, using the appropriate keyword helps readers understand the nature of the interaction being modeled.

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  DB = database "Database"
}

// Scenario: User behavior
Checkout = scenario "User Checkout" {
  Customer -> Shop.WebApp "adds items to cart"
  Shop.WebApp -> Shop.API "submits cart"
}

// Flow: Data flow
OrderProcess = flow "Order Processing" {
  Customer -> Shop "Order Details"
  Shop -> Shop.API "Processes"
  Shop.API -> Shop.DB "Save Order"
}

When to use:

  • Use scenario for user journeys, business processes, and behavioral flows
  • Use flow for data pipelines, ETL processes, and system-to-system data flows

Tips

  • Keep step labels short and action‑oriented
  • Use fully qualified names when referring outside the current context
  • Use scenario or story for behavior; use flow for data flows; use relations for structure

See Also

ADR


title: "Architecture Decision Records (ADR)" weight: 51 summary: "Capture architecture decisions directly in your model."

Architecture Decision Records (ADR)

Sruja allows you to capture Architecture Decision Records (ADRs) directly within your architecture model. This keeps the "why" close to the "what".

Syntax

Defining an ADR

You can define an ADR with a full body describing the context, decision, and consequences.

import { * } from 'sruja.ai/stdlib'


ADR001 = adr "Use PostgreSQL" {
status "Accepted"
context "We need a relational database with strong consistency guarantees."
decision "We will use PostgreSQL 15."
consequences "Good ecosystem support, but requires managing migrations."
}

Linking ADRs

You can link an ADR to the elements it affects (System, Container, Component) by referencing its ID inside the element's block.

import { * } from 'sruja.ai/stdlib'


Backend = system "Backend API" {
// Link to the ADR (via metadata in future)
}

Optional Title

The title is optional if you are just referencing an ADR or if you want to define it later.

adr = kind "ADR"
ADR003 = adr "Deferred Decision"

Fields

  • ID: Unique identifier (e.g., ADR001).
  • Title: Short summary of the decision.
  • Status: Current status (e.g., Proposed, Accepted, Deprecated).
  • Context: The problem statement and background.
  • Decision: The choice made.
  • Consequences: The pros, cons, and implications of the decision.

Policy


title: "Policy" weight: 50 summary: "Define architectural rules and constraints that must be followed."

Policy

Policies define architectural rules, standards, and constraints that your system must follow. They help enforce best practices, compliance requirements, and organizational standards directly in your architecture model.

Syntax

PolicyID = policy "Description" {
  category "category-name"
  enforcement "required" // "required" | "recommended" | "optional"
  description "Detailed description"
  metadata {
    // Additional metadata
  }
}

Simple Policy

import { * } from 'sruja.ai/stdlib'


SecurityPolicy = policy "Enforce TLS 1.3 for all external communications"

view index {
  include *
}

Policy Fields

  • ID: Unique identifier for the policy (e.g., SecurityPolicy, GDPR_Compliance)
  • Description: Human-readable description of the policy
  • category (optional): Policy category (e.g., "security", "compliance", "performance")
  • enforcement (optional): Enforcement level ("required", "recommended", "optional")
  • description (optional): Detailed description within the policy body
  • metadata (optional): Additional metadata key-value pairs

Example: Security Policies

import { * } from 'sruja.ai/stdlib'


TLSEnforcement = policy "All external communications must use TLS 1.3" {
  category "security"
  enforcement "required"
}

EncryptionAtRest = policy "Sensitive data must be encrypted at rest" {
  category "security"
  enforcement "required"
}

BankingApp = system "Banking App" {
  API = container "API Service"
  CustomerDB = database "Customer Database"
}

view index {
  include *
}

Example: Compliance Policies

import { * } from 'sruja.ai/stdlib'


HIPAACompliance = policy "Must comply with HIPAA regulations" {
  category "compliance"
  enforcement "required"
  description "All patient data must be encrypted and access logged"
}

DataRetention = policy "Medical records retained for 10 years" {
  category "compliance"
  enforcement "required"
}

view index {
  include *
}

Policy Categories

Common policy categories include:

  • security: Security standards and practices
  • compliance: Regulatory and legal requirements
  • performance: Performance standards and SLAs
  • observability: Monitoring, logging, and metrics requirements
  • architecture: Architectural patterns and principles
  • data: Data handling and privacy requirements

Enforcement Levels

  • required: Policy must be followed (non-negotiable)
  • recommended: Policy should be followed (best practice)
  • optional: Policy is a guideline (suggested)

Benefits

  • Documentation: Policies are part of your architecture, not separate documents
  • Validation: Can be validated against actual implementations
  • Communication: Clear standards for development teams
  • Compliance: Track regulatory and organizational requirements
  • Governance: Enforce architectural decisions and patterns

Note on Rules

The rule keyword inside policies is not yet implemented. For now, policies serve as documentation and can be validated manually or through external tooling.

See Also

Reference: syntax


title: "Syntax Reference" weight: 51 summary: "Core constructs and fields for Sruja DSL."

Syntax Reference

Elements

import { * } from 'sruja.ai/stdlib'


ID = person "Label"
ID = system "Label" { ... }
ID = container "Label" { ... }
ID = database "Label" { ... }
ID = queue "Label" { ... }
ID = component "Label" { ... }

Relations

Source -> Target "Label"
// Use fully qualified names when referring to nested elements:
System.Container -> System.API "Label"
System.Container.Component -> System.API.Component "Label"

Metadata

overview {
  summary "Syntax Reference Overview"
}

MySystem = system "MySystem" {
  metadata {
    team "Platform"
    tier "critical"
  }
}

Deployment

deployment Prod {
  node Cloud {
    node Region {
      node Service {
        containerInstance Web
      }
    }
  }
}

Reference: patterns


title: "Architecture Patterns" weight: 53 summary: "Reusable patterns: request/response, event-driven, saga, CQRS."

Architecture Patterns

Request/Response

import { * } from 'sruja.ai/stdlib'


App = system "App" {
Web = container "Web"
API = container "API"
DB = database "Database"
}

App.Web -> App.API "Calls"
App.API -> App.DB "Reads/Writes"

view index {
include *
}

Event-Driven

import { * } from 'sruja.ai/stdlib'


Orders = system "Order System" {
OrderSvc = container "Order Service"
PaymentSvc = container "Payment Service"
}

Orders.OrderSvc -> Orders.PaymentSvc "OrderCreated event"
Orders.PaymentSvc -> Orders.OrderSvc "PaymentConfirmed event"

view index {
include *
}

Saga

import { * } from 'sruja.ai/stdlib'


Orders = system "Order System" {
OrderSvc = container "Order Service"
InventorySvc = container "Inventory Service"
PaymentSvc = container "Payment Service"
}

CreateOrderSaga = scenario "Order Creation Saga" {
Orders.OrderSvc -> Orders.InventorySvc "Reserves stock"
Orders.InventorySvc -> Orders.OrderSvc "Confirms reserved"
Orders.OrderSvc -> Orders.PaymentSvc "Charges payment"
Orders.PaymentSvc -> Orders.OrderSvc "Confirms charged"
}

view index {
include *
}

CQRS

import { * } from 'sruja.ai/stdlib'


App = system "App" {
CommandAPI = container "Command API"
QueryAPI = container "Query API"
ReadDB = database "Read Database"
WriteDB = database "Write Database"
}

App.CommandAPI -> App.WriteDB "Writes"
App.QueryAPI -> App.ReadDB "Reads"

view index {
include *
}

RAG (Retrieval-Augmented Generation)

import { * } from 'sruja.ai/stdlib'


AIQA = system "AI Q&A" {
Indexer = container "Indexer"
Retriever = container "Retriever"
Generator = container "Generator"
VectorDB = database "Vector Store"
}

AIQA.Indexer -> AIQA.VectorDB "Writes embeddings"
AIQA.Retriever -> AIQA.VectorDB "Searches"
AIQA.Generator -> AIQA.Retriever "Fetches contexts"

See examples/pattern_rag_pipeline.sruja for a production-ready model.

Agentic Orchestration

import { * } from 'sruja.ai/stdlib'


AgentSystem = system "Agent System" {
Orchestrator = container "Agent Orchestrator"
Planner = container "Planner"
Executor = container "Executor"
Tools = container "Tooling API"
Memory = database "Long-Term Memory"
}

AgentSystem.Orchestrator -> AgentSystem.Planner "Plans tasks"
AgentSystem.Orchestrator -> AgentSystem.Executor "Executes steps"
AgentSystem.Executor -> AgentSystem.Tools "Calls tools"
AgentSystem.Executor -> AgentSystem.Memory "Updates state"

view index {
include *
}

See examples/pattern_agentic_ai.sruja for a complete agent graph.

Cheatsheet


title: "Sruja Cheatsheet" summary: "Quick syntax and common patterns for fast modeling."

Sruja Cheatsheet

Elements

import { * } from 'sruja.ai/stdlib'

User = person "User"
App = system "App" {
    Web = container "Web"
    API = container "API"
    DB = database "DB"
}
User -> App.Web "Uses"
App.Web -> App.API "Calls"
App.API -> App.DB "Reads/Writes"

view index {
    include *
}

Tip

The import { * } from 'sruja.ai/stdlib' line provides all standard kinds. You can also declare kinds manually if needed: person = kind "Person", system = kind "System", etc.

Component

import { * } from 'sruja.ai/stdlib'

App = system "App" {
    Web = container "Web" {
        Cart = component "Cart"
    }
}

Scenario

import { * } from 'sruja.ai/stdlib'

User = person "User"

App = system "App" {
    Web = container "Web"
    API = container "API"
    DB = database "Database"
}

scenario Checkout "Checkout Flow" {
    User -> App.Web "adds items"
    App.Web -> App.API "validates"
    App.API -> App.DB "stores order"
}

Deployment

deployment Prod {
  node Cloud {
    node Region {
      node Service {
        containerInstance App.Web
      }
    }
  }
}

Try it

Use the VS Code extension to paste these snippets into a .sruja file and see the diagram preview.

Adoption guide


title: "Adoption Guide" weight: 21 summary: "Complete guide to evaluating and adopting Sruja for your organization."

Sruja Adoption Guide

Using Sruja in your repo

For a short, practical guide (install CLI, add to your project, CI, AI, multi-repo), see Using Sruja in your project. The rest of this adoption guide helps you evaluate fit and plan rollout.

Is Sruja Right for Your Organization?

Quick Self-Assessment

Answer these questions to determine if Sruja addresses your needs:

Architecture & Documentation Pain Points

  • Do your architecture diagrams become outdated within weeks?
  • Do engineers spend significant time maintaining documentation?
  • Is there confusion about "the latest architecture diagram"?
  • Do new engineers struggle to understand system architecture?
  • Are architectural decisions lost when senior engineers leave?

If 3+ are "Yes" → Sruja can help

Compliance & Governance Needs

  • Do you need to comply with regulations (HIPAA, SOC2, PCI-DSS, GDPR)?
  • Are compliance audits time-consuming and risky?
  • Do you struggle to prove architectural controls meet requirements?
  • Are security policies documented but not enforced?
  • Do you need to demonstrate compliance to auditors?

If 2+ are "Yes" → Sruja's policy-as-code is valuable

Technical Architecture Challenges

  • Do you have microservices that need governance?
  • Are you experiencing architectural drift (implementation vs. design)?
  • Do you need to enforce service boundaries and dependencies?
  • Are circular dependencies causing issues?
  • Do you need to generate infrastructure from architecture?

If 2+ are "Yes" → Sruja's validation and enforcement help

DevOps & Engineering Culture

  • Do you use Git/GitOps workflows?
  • Do you have CI/CD pipelines?
  • Do you value "everything as code" (IaC, GitOps)?
  • Do you want architecture changes in PR reviews?
  • Do you need architecture to integrate with Terraform/Istio/etc.?

If 3+ are "Yes" → Sruja fits your workflow

Organization Size & Maturity

Sruja is ideal for:

  • Startups (10-50 engineers): Fast scaling, need consistency
  • Scale-ups (50-200 engineers): Managing complexity, compliance needs
  • Enterprises (200+ engineers): Governance, compliance, knowledge management

Sruja may not be ideal if:

  • ❌ You have < 5 engineers (overhead may outweigh benefits)
  • ❌ You don't use version control or CI/CD
  • ❌ You prefer visual-only tools (no code/DSL)
  • ❌ You have no compliance or governance requirements

Decision Framework

Step 1: Define Your Goals

What problem are you trying to solve?

GoalSruja BenefitPriority
Reduce documentation overheadArchitecture-as-code stays currentHigh
Ensure compliancePolicy-as-code with automated validationHigh
Prevent architectural driftAutomated validation in CI/CDMedium
Faster onboardingLiving documentation in codebaseMedium
Enforce service boundariesLayer and dependency validationMedium
Generate infrastructureTerraform/OpenTofu generation (roadmap)Low

Action: Rank your top 3 goals. Sruja should address at least 2.

Step 2: Calculate Value & ROI

Note: Sruja is free and open source. This ROI calculation measures time savings and value, not purchase cost.

Quick Value Calculator:

Time Savings = (Engineers × Hours/Week × 0.7) × 50 weeks × $100/hour
Onboarding Savings = (New Engineers/Year × 2 weeks × 0.5) × $150k/year ÷ 50
Risk Reduction = Compliance Failures Avoided × $100k

Total Value = Time Savings + Onboarding + Risk Reduction

Example (10 senior engineers, 20 new engineers/year):

  • Time: 10 × 4 hours × 0.7 × 50 × $100 = $140k/year
  • Onboarding: 20 × 2 × 0.5 × $150k ÷ 50 = $60k/year
  • Risk: 1 failure avoided = $100k (one-time)
  • Total Value: $200k+ per year

ROI: Since Sruja is free, ROI is essentially infinite - you get value with zero cost.

Step 3: Assess Technical Fit

Evaluate your technical stack:

TechnologySruja IntegrationStatus
Git/GitHub/GitLabNative integration✅ Available
CI/CD (GitHub Actions, GitLab CI)Validation in pipelines✅ Available
Terraform/OpenTofuInfrastructure generation🚧 Roadmap (Phase 2)
Kubernetes/IstioService mesh config generation🚧 Roadmap (Phase 3)
API Gateways (Kong, Apigee)Config generation🚧 Roadmap (Phase 3)
OPA (Open Policy Agent)Policy integration🚧 Roadmap (Phase 2)

Action:

  • If you need Git/CI/CD integration → ✅ Ready now
  • If you need Terraform/Istio/OPA → 🚧 On roadmap (see Roadmap Discussions) — you can pilot with current features now

Evaluation Process

Phase 1: Discovery (Week 1)

Activities:

  1. Review Sruja documentation
  2. Try Sruja Designer online
  3. Install CLI: curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
  4. Model a simple existing system

Deliverable: Understanding of Sruja capabilities

Phase 2: Proof of Concept (Weeks 2-4)

Activities:

  1. Model 1-2 real systems in Sruja
  2. Integrate validation into CI/CD
  3. Document architecture decisions as ADRs
  4. Measure time savings

Success Criteria:

  • Can model systems accurately
  • Validation catches real issues
  • Team sees value
  • Time savings measurable

Deliverable: PoC report with value estimate

Phase 3: Pilot (Months 2-3)

Activities:

  1. Roll out to 1-2 teams
  2. Establish best practices
  3. Create internal documentation
  4. Measure compliance improvements

Success Criteria:

  • Architecture stays current
  • Compliance validation working
  • Team adoption > 80%
  • Positive value demonstrated

Deliverable: Pilot report with go/no-go recommendation

Decision Checklist

Must-Have Requirements

  • Problem Fit: Sruja addresses 2+ of your top goals
  • Value Positive: Calculated value > $100k/year (or equivalent time savings)
  • Technical Fit: Git/CI/CD integration available (or roadmap acceptable)
  • Team Readiness: Team comfortable with code-based tools
  • Leadership Support: Time allocated for adoption (no budget needed - Sruja is free)

Nice-to-Have Requirements

  • Advanced features needed (Terraform, Istio, OPA)
  • Compliance requirements (HIPAA, SOC2, PCI-DSS)
  • Large team (100+ engineers)
  • Microservices architecture

Decision Matrix

CriteriaWeightYour Score (1-5)Weighted Score
Problem fit30%______
Value/ROI25%______
Technical fit20%______
Team readiness15%______
Leadership support10%______
Total100%___/5.0

Decision Rule:

  • > 4.0: Strong fit → Proceed with pilot
  • 3.5-4.0: Good fit → Consider pilot
  • < 3.5: Weak fit → Reassess or wait

Common Concerns & Objections

"We already have architecture documentation"

Response: Sruja doesn't replace documentation — it makes it executable. Your documentation becomes code that:

  • Stays current (version-controlled)
  • Validates automatically
  • Enforces policies
  • Integrates with DevOps

"Our team isn't technical enough for a DSL"

Response: Sruja's DSL is designed for all developers:

  • 1st-year CS students productive in 10 minutes
  • Progressive disclosure (simple → advanced)
  • Rich error messages guide users
  • VS Code extension with full LSP support (autocomplete, go-to-definition, rename, find references, and more) - see VS Code Extension Guide

"We don't have compliance requirements"

Response: Sruja provides value beyond compliance:

  • Faster onboarding (50% reduction)
  • Reduced documentation time (20-30%)
  • Architectural validation (prevents drift)
  • Knowledge preservation

"The roadmap features we need aren't ready"

Response:

  • Core features (validation, CI/CD) are available now
  • Roadmap features (Terraform, Istio, OPA) are planned for Phase 2-3 (see Roadmap Discussions)
  • You can start with core features and add advanced later
  • Early adoption gives you influence on roadmap priorities

Success Metrics

Track These KPIs

MetricBaselineTarget (3 months)Target (6 months)
Documentation timeX hours/weekX × 0.7 hours/weekX × 0.5 hours/week
Onboarding timeX weeksX × 0.7 weeksX × 0.5 weeks
Architecture freshnessX% outdated< 10% outdated< 5% outdated
Compliance violationsX per quarterX × 0.5 per quarter0 per quarter
Architectural issues caughtX in productionX × 0.3 in productionX × 0.1 in production

Next Steps

Immediate Actions

  1. Complete Self-Assessment (above)
  2. Calculate Value (Step 2)
  3. Try Sruja (see Getting Started)
  4. Join Community (GitHub Discussions, Discord, etc.)

Decision Timeline

  • Week 1: Self-assessment and value calculation
  • Week 2-4: Proof of concept
  • Month 2-3: Pilot program
  • Month 4+: Full rollout (if successful)

Resources

Open Source & Community Support

Sruja is free and open source (Apache 2.0 licensed), developed by and for the community. You can:

  • Use it freely: No licensing fees or restrictions
  • Contribute: Submit PRs, report issues, suggest features
  • Extend it: Build custom validators, exporters, and integrations
  • Join the community: Participate in GitHub Discussions, share use cases, and learn from others

Professional Services

While Sruja is open source and free to use, professional consulting services are available for organizations that need:

  • Implementation support: Help rolling out Sruja across teams and systems
  • Best practices guidance: Establish architectural governance patterns and workflows
  • Custom integrations: Integrate Sruja with existing CI/CD, infrastructure, and monitoring tools
  • Training: Team training on Sruja DSL, validation patterns, and architectural modeling
  • Custom development: Build custom validators, exporters, or platform integrations

Contact the team through GitHub Discussions to discuss your needs.

Future Platform Vision

Sruja is designed to evolve into a comprehensive platform for architectural governance:

  • Live System Review: Compare actual runtime behavior against architectural models to detect drift and violations.
  • Gap Analysis: Automatically identify missing components, undocumented dependencies, and architectural gaps.
  • Continuous Validation: Monitor production systems against architectural policies and constraints in real-time.
  • Compliance Monitoring: Track and report on architectural compliance across services and deployments.

These capabilities are planned for future releases. The current open source foundation provides the building blocks for this evolution, and community feedback helps shape the roadmap.


Note: This guide helps you evaluate whether Sruja is the right fit for your organization and how to adopt it successfully.

Ready to evaluate Sruja? Start with the Self-Assessment above.

Adoption playbook


title: "Adoption Playbook" weight: 22 summary: "Practical steps to roll out Sruja across teams and CI."

Adoption Playbook

Week 1: Baseline & CI

  • Create a minimal architecture.sruja covering core systems.
  • Add sruja fmt and sruja lint to CI; fail on violations.
  • Export docs: sruja export markdown architecture.sruja.

Week 2: Targets & Guardrails

  • Add slo and scale for critical paths.
  • Encode constraints and conventions; publish to teams.
  • Introduce views for API/Data/Auth focus.

Week 3: Governance & Evolution

  • Add policy pages for security/operability.
  • Document decisions with adr blocks; track evolution with slo values (target vs current).
  • Use Git for automatic change tracking (git log, git diff, version tags).
  • Wire linting to PR checks; require green builds.

CI Example (GitHub Actions)

name: sruja
on: [push, pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint DSL
        run: sruja lint architecture.sruja
      - name: Export Docs
        run: sruja export markdown architecture.sruja

Success Metrics

  • Review cycle time ↓
  • Incident rate for architecture errors ↓
  • Consistency across services ↑

Note: Sruja is free and open source (Apache 2.0 licensed). Need help with implementation? Professional consulting services are available. Contact the team through GitHub Discussions to learn more.

Using Sruja in Your Project

This guide is for teams and organizations that want to use Sruja in their own repositories to enhance their code: architecture-as-code, validation in CI, and AI-assisted generation with consistent rules.

What you get

  • Architecture as code.sruja files in Git; no separate diagram tool to keep in sync.
  • Validationsruja lint catches undefined refs, circular dependencies, missing fields, orphans.
  • AI-friendly – Rules and skills so Cursor, Copilot, etc. generate valid Sruja and better architecture.
  • CI – Fail PRs when architecture is invalid; optional export to Markdown/JSON/Mermaid for docs.

1. Install (your machine and/or CI)

CLI

Option A – from Git (recommended for now):

cargo install sruja-cli --git https://github.com/sruja-ai/sruja

Option B – install script (if available in repo):

curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash

Check:

sruja --help
sruja lint --help

VS Code extension

Install Sruja Language Support from the VS Code Marketplace (or Open VSX). You get syntax highlighting, LSP diagnostics, and optional diagram preview for .sruja files.


2. Add Sruja to your repo (5 minutes)

Step 1: Create or add architecture

# From your repo root
sruja init my-service
# Creates: my-service.sruja, .cursorrules, .copilot-instructions.md, .architecture-skill.md

Or add a single file, e.g. architecture.sruja or docs/architecture.sruja, and define your systems/containers/relationships (see Language specification and examples).

Step 2: AI editor integration (so AI-generated code follows rules)

The files created by sruja init are enough for most teams:

  • .cursorrules – Cursor uses this for Sruja DSL rules.
  • .copilot-instructions.md – GitHub Copilot uses this.
  • .architecture-skill.md – Short pointer; optional full skill: npx skills add sruja-ai/sruja --skill sruja-architecture.

Commit these so everyone (and CI) has the same setup. See AI editor integration in the repo for details.

Step 3: Validate in CI

In your repo you don't have the Sruja monorepo, so install the CLI in CI from Git, then run lint.

GitHub Actions example:

name: Validate Sruja

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]
  paths:
    - '**/*.sruja'

jobs:
  sruja:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable

      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja --locked

      - name: Lint all .sruja files
        run: |
          find . -name '*.sruja' -not -path './target/*' | while read f; do
            echo "Linting $f"
            sruja lint "$f"
          done

Use --locked so the install matches the lockfile in the Sruja repo for reproducible CI.

Optional – export docs in CI:

      - name: Export architecture to Markdown
        run: |
          for f in $(find . -name '*.sruja' -not -path './target/*'); do
            out="${f%.sruja}.md"
            sruja export markdown "$f" > "$out" || true
          done
      - name: Upload architecture docs
        uses: actions/upload-artifact@v4
        with:
          name: architecture-docs
          path: '**/*.sruja.md'

3. How this enhances your code

PracticeHow Sruja helps
PR reviewsCI fails if .sruja is invalid; reviewers see architecture changes in the diff.
OnboardingNew devs read .sruja and exported docs instead of hunting for "the" diagram.
AI-generated code.cursorrules and Copilot instructions steer AI to valid DSL; sruja lint catches mistakes.
Compliance / governancePolicies and constraints in the DSL; lint enforces structure; export for auditors.
Multi-repoEach repo can have its own architecture.sruja (or one per service); same CLI and CI pattern.

4. Using Sruja across multiple repos

  • Per-repo – Each repository that owns a service or app can have its own .sruja file(s). Add the same CI job (install CLI from Git + sruja lint) and the same AI files (e.g. copy .cursorrules and .copilot-instructions.md from a template or run sruja init once and commit).
  • Central docs repo – Some teams keep a single "docs" or "architecture" repo with one or more .sruja files and run Sruja CI there; link to exported Markdown/JSON from other repos. Other repos don't need the CLI unless they also own architecture files.
  • Shared rules – Use the same sruja-architecture skill (npx skills add sruja-ai/sruja --skill sruja-architecture) across repos so AI and humans share the same patterns and trade-offs.

5. Where to go next

Sruja is open source. To report issues or suggest improvements, use GitHub Issues or Discussions.

Sruja Design Philosophy: The Unified Intuitive DSL

Objective

Create a modeling language that empowers all developers - from students to enterprise architects - to design systems with confidence, while naturally guiding them toward simplicity and preventing over-engineering.

Core Principles:

  1. Start simple, stay simple: A 1st-year CS student should be productive in 10 minutes, and advanced developers should be guided away from unnecessary complexity.
  2. Empower, don't restrict: The language should enable all developers, not limit them, but guide them toward good design.
  3. Approachability first: Complex concepts should be available but not encouraged unless truly needed.
  4. Prevent over-engineering: The language itself should make simple designs easier than complex ones.
  5. Systems thinking made simple: Enable holistic system understanding through intuitive syntax, without requiring complex theory.

Methodology Analysis

MethodologyCore ConceptsJargon LevelStudent IntuitionSruja Mapping
C4System, Container, ComponentLow"Boxes and lines" - Easy to grasp.system, container, component
DDDBounded Context, Aggregate, Entity, Value Object, Domain EventHigh"Aggregate Root" is confusing. "Value Object" is abstract.Not currently supported
ER (DB)Entity, Attribute, Relationship, Table, ColumnMedium"Entity" is standard. "Relationship" is clear.data, datastore, -> (relation)
API (OpenAPI)Path, Method, Schema, PropertyMedium"Endpoint" is clear. "Schema" is clear.api, data (as schema)
DODData, Struct, Array, TransformLow"Data" and "Struct" are very familiar to coders.data, [] (arrays)

The "Unified" Proposal

We need a set of keywords that map to these concepts without forcing the user to learn the specific theory first. The language should support progressive disclosure: simple concepts first, advanced concepts when needed.

1. Grouping (The "Container")

Problem: Different methodologies use different terms for logical boundaries.

MethodologyTermSruja KeywordWhen to Use
C4ContainercontainerTechnical deployment boundary (e.g., "Web Server", "Database")
GeneralGroupingmoduleGeneric logical grouping (most intuitive)

Decision:

  • module: Primary keyword for logical grouping. Familiar to Python/JS/Go developers.
  • container: For C4-style technical containers (deployment units).

Rationale: module is the most universal term. Students learn "modules" in their first programming course.

2. The "Thing" (Data Structure)

Problem: Entity vs Value Object vs Table vs Struct - all represent "data" but with different semantics.

MethodologyTermSruja KeywordSemantics
GeneralEntity/Structdata (with id)Has identity, mutable
GeneralValue/Structdata (no id)Immutable, defined by values
ERTable/Entitydata or datastorePersistent storage
DODStructdataIn-memory structure
APISchemadataRequest/response structure

Decision: data is the unified keyword.

Rules:

  • If data has an id field → Implicitly an Entity (has identity)
  • If data has no id → Implicitly a Value Object (value-based)
  • If data is in a datastore → Implicitly a database table
  • If data is in an api → Implicitly a request/response schema

Rationale: Students understand "data" immediately. The semantics emerge from context, not explicit keywords.

3. The "Action" (Behavior/Event)

Problem: Different types of actions need different modeling approaches.

TypeSruja KeywordPurposeExample
API EndpointapiExternal interfaceREST endpoint, GraphQL query
EventeventSomething that happenedOrderPlaced, PaymentProcessed
Function/Method(implicit in component)Internal behaviorBusiness logic in components

Decision:

  • api: Explicit API endpoints (students understand "API")
  • event: Events (something that happened)
  • Component behavior: Implicit (components contain behavior)

Rationale: Students learn APIs early. Events are intuitive ("something happened").

4. Relationships

Problem: How to model connections between elements?

Decision: Use arrow syntax -> for relationships.

User -> ShopAPI.WebApp "Uses"
ShopAPI.WebApp -> ShopAPI.Database "Reads/Writes"
Order -> Payment "Triggers"

Rationale: Arrows are universal. Everyone understands "A -> B" means "A relates to B".

Proposed Syntax: "The Universal Model"

Level 1: Beginner (C4 Style)

// Element kinds
person = kind "Person"
system = kind "System"
container = kind "Container"

// Elements
user = person "End User"

shop = system "Shop API" {
    webApp = container "Web Application" {
        technology "React"
    }

    db = container "PostgreSQL Database" {
        technology "PostgreSQL 14"
    }
}

// Relationships
user -> shop.webApp "Uses"
shop.webApp -> shop.db "Reads/Writes"

Level 2: Intermediate (Detailed Architecture)

// Element kinds
system = kind "System"
container = kind "Container"
component = kind "Component"
database = kind "Database"

// Architecture
ShopAPI = system "Shop API" {
    WebApp = container "Web Application" {
        technology "React"
        Cart = component "Shopping Cart"
        Checkout = component "Checkout Service"
    }

    API = container "API Gateway" {
        technology "Node.js"
    }

    DB = database "PostgreSQL Database" {
        technology "PostgreSQL 14"
    }
}

// Relationships
ShopAPI.WebApp -> ShopAPI.API "Calls"
ShopAPI.API -> ShopAPI.DB "Reads/Writes"

Level 3: Advanced (Governance + Operations)

// Element kinds
person = kind "Person"
system = kind "System"
container = kind "Container"
database = kind "Database"

// Elements
Customer = person "Customer"

Shop = system "E-commerce Shop" {
    description "High-performance e-commerce platform"

    API = container "API Gateway" {
        technology "Node.js"
        slo {
            latency {
                p95 "200ms"
                p99 "500ms"
            }
        }
        scale {
            min 3
            max 10
        }
    }

    DB = database "PostgreSQL" {
        technology "PostgreSQL 14"
    }
}

// Governance
R1 = requirement functional "Must support 10k concurrent users"
SecurityPolicy = policy "Encrypt all data" category "security" enforcement "required"

// Relationships
Customer -> Shop.API "Uses"
Shop.API -> Shop.DB "Reads/Writes"

Key Design Decisions

1. Progressive Disclosure

  • Beginner: Start with system, container, component (C4)
  • Intermediate: Add module, data, api, event (Unified)
  • Advanced: Use all features together for complex architectures

Rationale: Students can start simple and learn advanced concepts when needed.

2. Arrays: DOD-Style Syntax

Support [] syntax (e.g., items OrderItem[]) instead of just implied relationships.

Rationale: Very familiar to programmers. Makes data structures explicit.

3. Unified data Keyword

The data keyword represents data structures. The presence of an id field indicates an entity with identity.

Rationale: Reduces cognitive load. Students can model data structures without learning complex theory.

4. Explicit api Keyword

Model APIs alongside data to connect "Backend" to "Database".

Rationale: Students understand "APIs". This bridges the gap between data modeling and API design.

5. Context-Aware Semantics

The same keyword (data) means different things in different contexts:

  • In a module: Domain model
  • In a datastore: Database table
  • In an api: Request/response schema
  • In a component: Internal data structure

Rationale: One keyword, multiple interpretations based on context. Reduces vocabulary size.

Preventing Over-Engineering: Simplicity by Design

How Sruja Guides Toward Simplicity

1. Start Simple

  • Use system for technical/deployment modeling (C4 style)
  • Use module for logical grouping when needed
  • Keep it simple - don't add complexity unless necessary

2. Progressive Disclosure

  • Start with basic C4 concepts: system, container, component
  • Add module, data, api, event when you need more detail
  • Use only what you need for your use case

3. Natural Constraints

  • system syntax is straightforward for deployment modeling
  • The language guides you to use the right level of detail
  • Simple designs are easier to write than complex ones

4. Validation & Guidance (Future)

  • Warn if over-engineering simple systems
  • Help users choose the right level of detail
  • Guide toward simplicity

5. Clear Mental Models

  • system = "How is this deployed?" (Physical/Technical)
  • module = "How is this organized?" (Logical grouping)
  • Keep it focused on what you're actually modeling

Missing Concepts & Future Considerations

Currently Missing (but important):

  1. Constraints/Validation: How to express "email must be valid", "age > 0"?
  2. Relationships with Cardinality: User -> Order[1:*] (one-to-many)?
  3. Inheritance/Polymorphism: How to model "Payment extends Transaction"?
  4. Enums: status: OrderStatus where OrderStatus = [PENDING, COMPLETED, CANCELLED]
  5. Optional Fields: email?: string vs email string
  6. Defaults: status string = "PENDING"
  7. Computed Fields: total: float = items.sum(price * qty)

Recommendations:

  • Add enum keyword for enumerations
  • Support ? for optional fields: email? string
  • Support = for defaults: status string = "PENDING"
  • Consider constraint keyword for validation rules
  • Consider relationship syntax: User -> Order[1:*]
  • flow and scenario/story already implemented for flow thinking (DFD and BDD-style)

Migration Path

From C4 to Sruja:

// C4: System Context
system "E-Commerce System" {
    // C4: Container
    container "Web Application" {
        // C4: Component
        component "Order Controller"
    }
}

From Data Modeling to Sruja:

// Data structures
module Orders {
    data Order {
        id string
        items OrderItem[]
    }

    data OrderItem {
        product_id string
        qty int
    }

    data ShippingAddress {
        street string
        city string
    }
}

From ER to Sruja:

datastore Database {
    data User {
        id string
        email string
    }

    data Order {
        id string
        user_id string  // Foreign key relationship
    }
}

Systems Thinking: Simple and Intuitive

Goal: Empower developers to think about systems holistically - understanding how parts interact, boundaries, and emergent behavior - without requiring complex theory.

Core Systems Thinking Concepts (Simplified)

Systems thinking is about understanding:

  1. Parts and Relationships: How components connect and interact
  2. Boundaries: What's inside vs outside the system
  3. Flows: How information/data moves through the system
  4. Feedback Loops: How actions create reactions
  5. Context: The environment the system operates in

How Sruja Makes Systems Thinking Simple

1. Parts and Relationships (Already Built-In)

system ShopAPI {
    container WebApp
    container Database
}

ShopAPI.WebApp -> ShopAPI.Database "Reads/Writes"

Simple Insight: Just draw boxes and connect them with arrows. The relationships show how parts interact.

2. Boundaries (Natural in Sruja)

system ShopAPI {  // Inside boundary
    container WebApp
}

person User  // Outside boundary
User -> ShopAPI "Uses"

Simple Insight: system defines the boundary. person and external systems are outside. Clear and intuitive.

3. Flows (Built-In Flow Syntax)

// Data Flow Diagram (DFD) style — use scenario
scenario OrderProcess "Order Processing" {
    Customer -> Shop.WebApp "Order Details"
    Shop.WebApp -> Shop.Database "Save Order"
    Shop.Database -> Shop.WebApp "Confirmation"
}

// User Story/Scenario style
story Checkout "User Checkout Flow" {
    User -> ECommerce.CartPage "adds item to cart"
    ECommerce.CartPage -> ECommerce "clicks checkout"
    ECommerce -> Inventory "Check Stock"
}

// Or using simple qualified relationships
Customer -> Shop.WebApp "Submits Order"
Shop.WebApp -> Shop.OrderService "Processes"
Shop.OrderService -> Shop.PaymentService "Charges"

Simple Insight: Use flow for data flows (DFD), story/scenario for user stories, or simple relationships for basic flows. Events show what happens: event OrderPlaced.

4. Feedback Loops (Cycles in Relationships)

// Simple feedback: User action triggers system response
User -> System "Requests"
System -> User "Responds"

// System feedback: Component A affects Component B, which affects A
ComponentA -> ComponentB "Updates"
ComponentB -> ComponentA "Notifies"

// Event-driven cycles: Service A triggers Service B, which triggers A
ServiceA -> ServiceB "Sends Event"
ServiceB -> ServiceA "Responds with Event"

// Mutual dependencies: Microservices that call each other
OrderService -> PaymentService "Charges Payment"
PaymentService -> OrderService "Confirms Payment"

Simple Insight: When arrows form a cycle, that's a feedback loop. The system responds to itself. Cycles are valid in many architectures:

  • Feedback loops: User interactions, system responses
  • Event-driven patterns: Services triggering each other via events
  • Mutual dependencies: Microservices that need to communicate bidirectionally
  • Bidirectional flows: API <-> Database (read/write operations)

Note: Sruja allows cycles - they're a natural part of system design. The validator will inform you about cycles but won't block them, as they're often intentional architectural patterns.

5. Context (Persons and External Systems)

person Customer "End User"
person Admin "System Administrator"

system PaymentGateway "Third-party service" {
  tags ["external"]
}

Customer -> ShopAPI "Uses"
ShopAPI -> PaymentGateway "Processes payments"

Simple Insight: person and external show the context - who/what the system interacts with.

Progressive Systems Thinking

Beginner: Just model the parts and connections

// Element kinds
system = kind "System"
container = kind "Container"

// Elements
myApp = system "MyApp" {
    frontend = container "Frontend"
    backend = container "Backend"
}

// Relationships
myApp.frontend -> myApp.backend "Calls"

Intermediate: Add flows and events

// Simple qualified relationships
user -> myApp.frontend "Clicks"
myApp.frontend -> myApp.backend "Sends request"
myApp.backend -> myApp.database "Saves"

// DFD-style — use scenario
scenario OrderFlow "Order Processing" {
    user -> myApp.frontend "Submits"
    myApp.frontend -> myApp.backend "Processes"
    myApp.backend -> myApp.database "Stores"
}

Advanced: Model feedback loops and system behavior

// Feedback loop: User action -> System response -> User sees result
story CompleteOrder "Order Completion Flow" {
    user -> shop.system "Submits"
    shop.system -> shop.database "Stores"
    shop.system -> user "Confirms"
}

// Complex flow with multiple steps — use scenario
scenario PaymentFlow "Payment Processing" {
    orders.orderService -> orders.paymentGateway "Charge"
    orders.paymentGateway -> orders.orderService "Confirms"
    orders.orderService -> user "Notifies"
}

Key Principle: No Jargon Required

  • Don't say: "Model the feedback loop using systems thinking principles"
  • Do say: "Use flow or story to show how data/actions move through the system"
  • Don't say: "Define the system boundary using context mapping"
  • Do say: "Use system to show what's inside, person to show who uses it"
  • Don't say: "Create a DFD (Data Flow Diagram)"
  • Do say: "Use flow to show how data moves between components"

Result: Developers naturally think in systems without learning theory first. The syntax guides them to see:

  • How parts connect (relationships: ->)
  • What's inside vs outside (boundaries: system vs person/external)
  • How things flow (flow for data flows, story/scenario for user stories, or simple -> relationships)
  • How actions create reactions (cycles in relationships, feedback in flows)

Additional Design Philosophy Assessment

After assessing various design philosophies (Event-Driven Architecture, Hexagonal Architecture, CQRS, BDD, Reactive Systems, etc.) through a strict lens of "does this help developers learn system design?", we found:

✅ Accepted: Simple & Valuable

  1. Flows and Scenarios: Already implemented! flow for data flows (DFD), scenario/story for user stories (BDD-style Given-When-Then)
  2. Optional Fields: Practical data modeling (email? string)
  3. Enums: Practical data modeling (status: OrderStatus)

❌ Rejected: Too Complex or Unnecessary

  • Hexagonal Architecture (Ports & Adapters) - Too abstract
  • Clean Architecture / Layers - Too theoretical
  • CQRS - Too specialized, can use existing api
  • Advanced Event-Driven - Current event is sufficient
  • Reactive Systems - Too complex
  • Actor Model - Too specialized
  • GraphQL/Protocol Buffers - Technology-specific
  • Semantic Web - Overkill
  • SOLID (as syntax) - Principles, not syntax

Note: Systems thinking is accepted - but implemented simply through existing syntax (relationships, boundaries, flows). No new keywords needed.

Key Finding: Most "advanced" concepts should be rejected. Only 3 simple additions are recommended, and everything else can wait until developers master the basics. Systems thinking is naturally supported through intuitive syntax.

Conclusion

By using system, module, data, api, and event, we cover 90% of use cases with words that a 1st-year CS student already knows.

Key Success Metrics:

  • ✅ Can a beginner model a simple system in 10 minutes? Yes (C4 style)
  • ✅ Can an intermediate model data + APIs? Yes (Unified style)
  • ✅ Can an advanced user model complex architectures? Yes (Extended features)
  • ✅ Does it prevent over-engineering? Yes (simplicity by design)
  • ✅ Is it approachable for all developers? Yes (progressive disclosure)

Next Steps:

  1. Add enum support
  2. Add optional fields (? syntax)
  3. Add relationship cardinality
  4. Add constraint/validation syntax
  5. flow and scenario/story already implemented - enhance documentation and examples
  6. Improve error messages for beginners
  7. Add validation rules to guide simplicity

Key Principle: Less is more. Don't add complexity unless it clearly helps developers learn system design better. The goal is to build confidence through simplicity, not complexity through features.

Glossary


title: "Glossary" weight: 100 summary: "Definitions of key terms and concepts used in Sruja."

Glossary

Quick reference for technical terms and concepts used throughout Sruja documentation.

A

ADR (Architecture Decision Record)

A document that captures an important architectural decision made along with its context and consequences. In Sruja, ADRs are defined using the adr keyword and can be linked to specific architecture elements.

Example:

ADR001 = adr "Use Microservices" {
  status "Accepted"
  context "Need to scale independently"
  decision "Adopt microservices architecture"
  consequences "Increased complexity but better scalability"
}

Architecture-as-Code

The practice of defining software architecture using code (text-based DSL) instead of static diagrams. This enables version control, validation, and automated documentation generation.

C

C4 Model

A hierarchical model for visualizing software architecture, created by Simon Brown. It consists of four levels:

  • Level 1 (Context): System context diagram showing the system and its users
  • Level 2 (Container): Container diagram showing high-level technical building blocks
  • Level 3 (Component): Component diagram showing internal structure of containers
  • Level 4 (Code): Code-level diagram (typically managed by IDEs, not Sruja)

Sruja is based on the C4 model and automatically generates C4-compliant diagrams.

Component

A structural element within a container that represents a major building block. Components are optional and provide additional detail when needed.

Example:

App = system "App" {
  API = container "API" {
    Auth = component "Authentication"
    Payment = component "Payment Processing"
  }
}

Container

A deployable unit within a system. In C4 terminology, a container is NOT a Docker container, but rather any separately deployable unit like:

  • Web applications
  • Mobile apps
  • Server-side applications
  • Databases
  • File systems

Example:

App = system "E-commerce" {
  Web = container "React App"
  API = container "Node.js API"
  DB = database "PostgreSQL"
}

D

Database

A type of container that represents a data store. In Sruja, databases are defined using the database keyword.

Example:

DB = database "PostgreSQL"

Note: Sruja also supports datastore as an alias, but database is the recommended term.

DSL (Domain-Specific Language)

A programming language specialized for a particular domain. Sruja DSL is a text-based language specifically designed for defining software architecture.

E

Element

Any architectural construct in Sruja: person, system, container, component, database, queue, etc. Elements are the building blocks of your architecture model.

K

Kind

A type definition for elements. Kinds can be imported from the standard library (import { * } from 'sruja.ai/stdlib') or declared manually for custom types.

Example:

// Using stdlib (recommended)
import { * } from 'sruja.ai/stdlib'

// Or declaring manually
microservice = kind "Microservice"

M

Metadata

Additional information attached to elements, such as team ownership, tier, or custom tags. Metadata helps with governance and organization.

Example:

App = system "My App" {
  metadata {
    team "Platform"
    tier "critical"
  }
}

P

Person

An actor or user of the system. Persons are defined at the context level and represent external users or roles.

Example:

User = person "Customer"
Admin = person "Administrator"

Q

Queue

A message queue or event stream used for asynchronous communication between containers.

Example:

EventQueue = queue "Kafka Topic"

R

Relation

A connection between two elements, showing how they interact. Relations are defined using the -> operator.

Example:

User -> App.Web "Visits"
App.Web -> App.API "Calls"

Requirement

A functional or non-functional requirement that can be linked to architecture elements using tags. Requirements help trace business needs to technical implementation.

Example:

R001 = requirement "User Authentication" {
  description "Users must be able to log in securely"
  tags ["App.Web", "App.API"]
}

S

Scenario

A sequence of interactions that describe how users or systems interact to accomplish a goal. Scenarios help document user flows and system behavior.

Example:

scenario Checkout "Checkout Flow" {
  User -> App.Web "adds items to cart"
  App.Web -> App.API "validates cart"
  App.API -> App.DB "stores order"
}

System

The highest-level element in C4 Level 1. A system represents a software system that delivers value to its users.

Example:

ECommerce = system "E-commerce Platform" {
  description "Online shopping platform"
}

stdlib (Standard Library)

The Sruja standard library that provides common element kinds (person, system, container, database, etc.). Importing from stdlib is the recommended way to use standard types.

Example:

import { * } from 'sruja.ai/stdlib'

T

Tag

A label that can be attached to elements, requirements, or ADRs to enable filtering, grouping, and traceability.

Example:

App = system "My App" {
  tags ["production", "critical"]
}

V

View

A diagram perspective that shows a subset of the architecture. Views can be explicit (defined with view blocks) or implicit (automatically generated by Sruja).

Example:

view index {
  title "Complete System View"
  include *
}

Style guide


title: "Documentation Style Guide" weight: 100 summary: "Standards for tutorials, how‑tos, reference, and explanation." tags: ["docs", "style", "quality"]

Documentation Style Guide

Goals

  • Align with the Diátaxis framework: Tutorials, How‑to Guides, Reference, Explanation
  • Improve clarity, consistency, and task‑orientation
  • Raise quality to industry standards (Stripe, React, Kubernetes, MDN)

Front Matter

  • Required: title, summary
  • Recommended: prerequisites, learning_objectives, estimated_time, difficulty, tags, last_updated

Headings

  • Use Title Case for H1/H2/H3
  • Keep headings unique; avoid duplicates within a page

Code Blocks

  • Always specify language fences: bash, sh, json, yaml, go, ts, tsx, md, sruja
  • Prefer copy‑ready commands; avoid interactive prompts where possible

Admonitions

  • Use standard callouts: Note, Tip, Warning
  • Keep callouts short and action‑oriented
  • Prefer descriptive link text (not raw URLs)
  • Cross‑link to Reference and Examples when teaching a concept or task

Images & Diagrams

  • Include small screenshots or diagram previews for expected outcomes
  • Use alt text that describes the intent and context

Tutorials

  • Structure: Overview → Prerequisites → Steps → Outcome → Troubleshooting → Next Steps
  • Include at least one end‑to‑end task with an expected output

How‑to Guides

  • Task‑oriented and concise
  • Structure: Purpose → Steps → Validation → References

Reference

  • Precise, complete, and skimmable tables/lists
  • Avoid narrative; link outward to tutorials for workflows

Explanation

  • Conceptual background, rationale, trade‑offs
  • Link to reference for details and to tutorials for practice

Quality Gates

  • Markdown lint for headings, lists, links
  • Link checking for external and internal links
  • Optional accessibility lint (alt text, heading levels)

Review Checklist

  • Front matter present and complete
  • Headings consistent and unique
  • Code fences have language tags
  • Cross‑links added to relevant Reference/Examples
  • Outcome preview or screenshot included where appropriate

Community


title: "Community" weight: 90 summary: "Join the Sruja community, contribute, and help shape the future of architecture-as-code."

Sruja Community

Welcome to the Sruja community! Sruja is an open source project built by and for developers who care about software architecture. Whether you're here to learn, contribute, or get help, we're glad you're here.

Join the Conversation

💬 Discord

Join our Discord server for real-time chat, quick questions, and community discussions:

Join Discord

Discord is great for:

  • Getting quick help with questions
  • Discussing ideas and use cases
  • Sharing your Sruja projects
  • Connecting with other community members

💬 GitHub Discussions

For longer-form discussions, feature requests, and Q&A:

GitHub Discussions

GitHub Discussions is ideal for:

  • Feature proposals and RFCs
  • Technical discussions
  • Sharing tutorials and examples
  • Asking detailed questions

🐛 GitHub Issues

Found a bug or have a feature request?

Open an Issue

Ways to Contribute

Sruja is an open source project, and we welcome contributions of all sizes! There are many ways to contribute, even if you're not a developer.

No Code Required

Documentation

  • Fix typos or improve clarity
  • Add examples and tutorials
  • Translate documentation
  • Write blog posts or courses

Testing & Feedback

  • Test new features and report bugs
  • Share your use cases
  • Provide feedback on design decisions
  • Help improve error messages

Community

  • Answer questions in Discord or Discussions
  • Help newcomers get started
  • Share your Sruja projects and experiences

Beginner-Friendly Code

Small Improvements

  • Add test cases
  • Fix small bugs
  • Improve error messages
  • Add examples to the examples/ directory
  • Improve CLI help text

Documentation Code

  • Add code examples
  • Update API documentation
  • Create tutorials

More Advanced Contributions

Features & Enhancements

  • Implement new features
  • Add new export formats
  • Add validation rules
  • Improve tooling and developer experience

Core Development

  • Work on the language parser
  • Enhance the validation engine
  • Build platform integrations
  • Develop plugins and extensions

Getting Started with Contributions

🎯 First Time Contributing?

Start here: Contribution Guide

This step-by-step guide walks you through:

  • Finding your first issue
  • Setting up your development environment
  • Making and submitting changes
  • Getting help when stuck

Contribution Workflow

  1. Fork and Branch: Fork the repo and create a topic branch
  2. Implement: Make your changes and test locally
  3. Commit: Follow Conventional Commits
  4. Pull Request: Open a PR with a clear description
  5. Review: Address feedback and iterate
  6. Merge: Once approved, your contribution is merged!

For detailed instructions, see the Contribution Guide.

Roadmap & Transparency

Sruja is developed transparently with community input. Our roadmap is public and open for discussion.

Current Roadmap

View Roadmap Discussions

The roadmap outlines our path to v1.0, including:

  • Advanced Governance & Compliance: Policy as code, architectural guardrails
  • Production Reality & Data Flow: Service mesh integration, runtime verification
  • Extensibility & Ecosystem: Plugin system, DevOps integrations
  • Platform Evolution: Live system review, gap detection, violation monitoring

Shaping the Roadmap

Your feedback shapes the roadmap! We welcome:

  • Feature requests via GitHub Discussions
  • Use case sharing to prioritize features
  • RFCs (Request for Comments) for major changes
  • Community voting on priorities

Community Expectations

We're committed to maintaining a welcoming and respectful community. When participating:

  • Be respectful and constructive: Treat everyone with kindness
  • Provide actionable feedback: Help others improve their contributions
  • Prefer documented decisions: Link to ADRs or issues when relevant
  • Start small: You can always contribute more later!

Recognition

We value all contributions, big and small. Contributors are recognized through:

  • GitHub contributor list
  • Release notes (for significant contributions)
  • Community highlights in discussions

Professional Services

While Sruja is open source and free to use, professional consulting services are available for organizations that need:

  • Implementation support: Help rolling out Sruja across teams
  • Best practices guidance: Establish architectural governance patterns
  • Custom integrations: Integrate with existing CI/CD and infrastructure
  • Training: Team training on Sruja DSL and architectural modeling
  • Custom development: Build custom validators, exporters, or integrations

Contact the team through GitHub Discussions to discuss your needs.

Resources

Documentation

Development

Community

Get Involved Today

Ready to contribute? Here are some quick ways to get started:

  1. Join Discord and introduce yourself
  2. Star the repository on GitHub to show your support
  3. Fix a typo in the documentation
  4. Add an example to the examples/ directory
  5. Share your use case in GitHub Discussions

Every contribution, no matter how small, helps make Sruja better for everyone. Thank you for being part of the community!


Questions? Reach out on Discord or GitHub Discussions. We're here to help!

Courses

Structured courses to learn architecture-as-code with Sruja, from fundamentals to production patterns.

CourseDescription
Systems Thinking 101Fundamentals, parts & relationships, boundaries, flows, feedback loops, context
System Design 101Fundamentals, building blocks, advanced modeling, production readiness
System Design 201High throughput, real-time, data-intensive, consistency
Ecommerce PlatformVision, architecture, SDLC, ops, evolution, governance
Production ArchitecturePerformance, modular design, governance
Agentic AIFundamentals, patterns, modeling for AI systems
Advanced ArchitectsPolicy as code and advanced topics

Start with Systems Thinking 101 or System Design 101 if you're new; use the Beginner path to combine courses with tutorials and challenges.

Systems Thinking 101

Learn to model systems holistically with Sruja. Master the five core systems thinking concepts: parts and relationships, boundaries, flows, feedback loops, and context.

Course Overview

Systems thinking helps you understand how components interact as part of a whole. This course teaches you to model systems using Sruja's architecture-as-code approach, enabling you to visualize and validate complex system interactions.

What You'll Learn

  • Module 1: Fundamentals - Core systems thinking concepts and why they matter
  • Module 2: Parts and Relationships - Model components and their interactions
  • Module 3: Boundaries - Define what's inside vs. outside your system
  • Module 4: Flows - Visualize data and information movement through the system
  • Module 5: Feedback Loops - Model cycles and reactive behaviors
  • Module 6: Context - Capture the environment, dependencies, and stakeholders

Prerequisites

Learning Path

Each module contains hands-on examples with Sruja syntax. You'll write .sruja files, validate them with sruja lint, and export to Mermaid diagrams with sruja export mermaid.

Why Systems Thinking?

  • Holistic understanding: See the whole system, not just parts
  • Natural patterns: Model real-world interactions and feedback
  • Clear boundaries: Understand what's in scope vs. context
  • Flow visualization: See how data and information move
  • Valid cycles: Feedback loops are natural, not errors

Course Duration

Approximately 6-8 hours to complete all modules and exercises.

Next Steps

Start with Module 1: Fundamentals or review the Beginner path for a complete learning journey.

Module 1: Fundamentals

Overview

In this module, you'll learn the core concepts of systems thinking and how they apply to software architecture modeling with Sruja.

Learning Objectives

By the end of this module, you'll be able to:

  • Define what systems thinking is and why it matters for software architecture
  • Identify the five core systems thinking concepts
  • Understand how Sruja supports systems thinking principles
  • Recognize when to use systems thinking in your architecture work

Lessons

Prerequisites

  • Basic understanding of software architecture concepts
  • Familiarity with Sruja DSL basics

Time Investment

Approximately 1-1.5 hours to complete all lessons.

What's Next

After completing this module, you'll dive into specific concepts starting with Module 2: Parts and Relationships.

Lesson 1: What is Systems Thinking?

Learning Goals

  • Understand what systems thinking is
  • Recognize systems in everyday life and software
  • See the connection between systems thinking and software architecture

What is Systems Thinking?

Systems thinking is a holistic approach to understanding how components interact as part of a whole. Instead of looking at individual parts in isolation, systems thinking focuses on the relationships, patterns, and emergent behaviors that arise when components work together.

A Simple Example

Consider a coffee shop:

Isolated view (reductionist):

  • Coffee machine
  • Barista
  • Cups
  • Beans
  • Customers

Systems thinking view:

  • Customer orders coffee → Barista uses machine → Machine produces coffee → Customer receives coffee → Customer might return
  • The coffee machine needs beans (supply chain)
  • The barista needs training (human systems)
  • The shop needs location (infrastructure)
  • Customer satisfaction affects future visits (feedback loop)

Systems in Software Architecture

Every software system is a system of systems:

User Interface
    ↓
Application Logic
    ↓
Data Layer
    ↓
Infrastructure

But there's more:

  • Dependencies: External APIs, libraries, services
  • People: Users, developers, stakeholders
  • Processes: Development, deployment, operations
  • Data: Information flows, state, transactions
  • Feedback: Monitoring, logs, user behavior

The Iceberg Model

Systems thinking uses the "iceberg model" to understand systems:

Events (what you see)
    ↓
Patterns (what's happening over time)
    ↓
Structures (what's causing the patterns)
    ↓
Mental Models (what's shaping the structures)

In software architecture:

  • Events: A user reports a bug
  • Patterns: Similar bugs occur repeatedly
  • Structures: Tightly coupled components, lack of testing
  • Mental Models: "We need to ship fast, quality can wait"

Why Systems Thinking in Architecture?

Traditional architecture often focuses on:

  • Components and their functions
  • Technology choices
  • Implementation details

Systems thinking adds:

  • Relationships and interactions
  • Emergent behaviors
  • Feedback loops
  • Context and boundaries
  • Flow of information

This leads to architectures that:

  • Adapt to change more easily
  • Handle edge cases better
  • Scale more naturally
  • Align with business goals

Sruja and Systems Thinking

Sruja is built on systems thinking principles:

  • Elements as parts: person, system, container, component
  • Relationships as interactions: clear, labeled connections
  • Scenarios as flows: data and information movement
  • Views as perspectives: different angles on the same system
  • Validation as feedback: catch problems early

Key Takeaways

  1. Systems thinking looks at the whole, not just parts
  2. Relationships matter as much as components
  3. Patterns and emergent behaviors are key insights
  4. Context is critical - nothing exists in isolation
  5. Sruja supports systems thinking through its language and features

Exercise

Think of a system you work with daily (could be a codebase, a service, or a team). Identify:

  • The main components (parts)
  • How they interact (relationships)
  • Who depends on it (context)
  • Any feedback loops (how information flows back)

Next Lesson

In Lesson 2, we'll explore the five core systems thinking concepts you'll master in this course.

Lesson 2: The Five Core Concepts

Learning Goals

  • Learn the five core systems thinking concepts
  • Understand how they apply to software architecture
  • See how Sruja implements each concept

The Five Core Concepts

In this course, you'll master five fundamental systems thinking concepts:

┌─────────────────────────────────────────────┐
│           SYSTEMS THINKING 101              │
├─────────────────────────────────────────────┤
│  1. Parts & Relationships                   │
│  2. Boundaries                              │
│  3. Flows                                   │
│  4. Feedback Loops                          │
│  5. Context                                 │
└─────────────────────────────────────────────┘

1. Parts and Relationships

Definition: What the system contains and how those pieces connect.

In Sruja

import { * } from 'sruja.ai/stdlib'

Customer = person "End User"
Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  DB = database "Database"
}

// Relationships = how parts connect
Customer -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> Shop.DB "Reads"

Why It Matters

  • Structure: Defines what exists in your system
  • Interactions: Shows how components communicate
  • Coupling: Reveals dependencies between parts

2. Boundaries

Definition: What's inside the system vs. what's outside (the environment).

In Sruja

// Inside: Your system
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

// Outside: External systems
PaymentGateway = system "Payment Service" {
  metadata { tags ["external"] }
}

// Crossing the boundary
Shop.API -> PaymentGateway "Processes payments"

Why It Matters

  • Scope: What you're responsible for vs. dependencies
  • Integration: How you connect with external systems
  • Security: What needs to be protected
  • Testing: What's in scope for your tests

3. Flows

Definition: How information, data, and actions move through the system.

In Sruja (Data Flow)

OrderFlow = scenario "Order Processing" {
  Customer -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Sends order data"
  Shop.API -> Shop.DB "Saves order"
  Shop.API -> PaymentGateway "Charges payment"
  Shop.API -> Shop.WebApp "Returns result"
  Shop.WebApp -> Customer "Shows confirmation"
}

Why It Matters

  • Data lineage: Where does information come from and go?
  • Process understanding: What's the sequence of actions?
  • Bottlenecks: Where do things slow down?
  • Error paths: What happens when something fails?

4. Feedback Loops

Definition: How actions create reactions that affect future actions.

In Sruja (Cycles)

// User feedback loop
User -> App.WebApp "Submits form"
App.WebApp -> App.API "Validates"
App.API -> App.WebApp "Returns result"
App.WebApp -> User "Shows feedback"

// System feedback loop
App.API -> App.DB "Updates inventory"
App.DB -> App.API "Notifies low stock"
App.API -> Admin "Sends alert"
Admin -> App.API "Adjusts inventory"

Why It Matters

  • Reactive behavior: How systems respond to changes
  • Stability: Does the system self-correct or spiral out of control?
  • Adaptation: How the system learns and evolves
  • Monitoring: What metrics indicate system health?

5. Context

Definition: The environment the system operates in - stakeholders, dependencies, constraints.

In Sruja

// Your system
Shop = system "Shop"

// Context: Stakeholders
Customer = person "End User"
Admin = person "Administrator"
Support = person "Customer Support"

// Context: Dependencies
PaymentGateway = system "Payment Service"
EmailService = system "Email Service"
Analytics = system "Analytics"

// Context relationships
Customer -> Shop "Uses"
Shop -> PaymentGateway "Depends on"
Shop -> EmailService "Sends notifications"
Admin -> Shop "Manages"

Why It Matters

  • Stakeholders: Who cares about this system?
  • Dependencies: What external systems do you rely on?
  • Constraints: What limits your design choices?
  • Success criteria: What does "good" look like?

How They Work Together

A complete systems thinking view combines all five:

┌──────────────────────────────────────────┐
│              CONTEXT                     │
│  Stakeholders: Users, Admins, Support    │
│  Dependencies: External APIs, Services   │
│                                          │
│    ┌────────── BOUNDARY ──────────┐     │
│    │                               │     │
│    │   PARTS ↔ RELATIONSHIPS       │     │
│    │   [User] → [App] → [DB]       │     │
│    │         ↑       ↓             │     │
│    │         └── FEEDBACK ─────┘  │     │
│    │                               │     │
│    │   FLOWS: Order, Payment      │     │
│    └───────────────────────────────┘     │
└──────────────────────────────────────────┘

Sruja Mapping Summary

Systems Thinking ConceptSruja Feature
Parts & RelationshipsElements (person, system, container) and relationships (->)
Boundariessystem definitions, metadata { tags ["external"] }
Flowsscenario and flow blocks
Feedback LoopsCycles in relationships (explicitly allowed)
ContextExternal elements, overview block, requirements

Key Takeaways

  1. Five concepts form the foundation: Parts/relationships, boundaries, flows, feedback loops, context
  2. They interconnect: None exists in isolation
  3. Sruja supports all five: Built into the language
  4. Use them together: Complete architecture requires all perspectives
  5. Practice makes perfect: You'll work with each in depth

Exercise

Pick a system you know well. For each concept, write one observation:

  • Parts/Relationships: Name 3 components and how they connect
  • Boundaries: What's inside vs. outside the system?
  • Flows: What's a common data path through the system?
  • Feedback Loops: Any automatic responses to user actions?
  • Context: Who are the main stakeholders?

Next Lesson

In Lesson 3, we'll explore the practical benefits of systems thinking for software architecture.

Lesson 3: Why Systems Thinking Matters

Learning Goals

  • Understand the benefits of systems thinking for software architecture
  • Recognize problems that systems thinking prevents
  • Apply systems thinking to real architecture challenges

The Architecture Problem

Traditional architecture approaches often lead to:

  • Siloed thinking: Components designed in isolation
  • Hidden dependencies: Unforeseen coupling
  • Brittle systems: Break when one thing changes
  • Misaligned with business: Doesn't support actual user journeys
  • Hard to scale: Performance and reliability issues

Systems thinking addresses these root causes.

Benefit 1: Holistic Understanding

See the whole system, not just parts.

Example: E-Commerce Checkout

Component-focused view:

  • Shopping cart service
  • Payment service
  • Inventory service
  • Notification service

Systems thinking view:

User → Cart Service → Payment Service → Inventory → Notification
  ↑                                              ↓
  └────────────── Feedback Loop ─────────────────┘

In Sruja

CheckoutFlow = scenario "Complete Checkout" {
  User -> CartService "Items ready"
  CartService -> PaymentService "Process payment"
  PaymentService -> CartService "Payment result"
  CartService -> InventoryService "Reserve items"
  InventoryService -> CartService "Stock confirmed"
  CartService -> NotificationService "Send confirmation"
  NotificationService -> User "Order confirmed"
}

Impact

  • Identifies the full user journey
  • Reveals where things can fail
  • Shows data dependencies
  • Makes success criteria clear

Benefit 2: Natural Patterns

Model real-world interactions and feedback.

Example: Auto-Scaling

Component view: Configure scaling rules per service.

Systems thinking view: Understand the feedback loop.

import { * } from 'sruja.ai/stdlib'

App = system "Application" {
  API = container "API Service" {
    scale {
      min 2
      max 10
      metric "cpu > 80%"
    }
    slo {
      latency {
        p95 "200ms"
      }
    }
  }
}

Monitor = system "Monitoring System"

// Feedback loop
Monitor -> App.API "Observes load"
App.API -> Monitor "Reports metrics"
Monitor -> App.API "Triggers scale up/down"

Impact

  • Self-documenting: The diagram explains the system
  • Clear causality: Why do things happen?
  • Predictable behavior: What happens under load?

Benefit 3: Clear Boundaries

Understand what's in scope vs. context.

Example: Third-Party Integration

Without boundaries: Where does your system end and Stripe begin?

With boundaries in Sruja:

// Inside: Your responsibility
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API Service"
}

// Outside: External dependency
Stripe = system "Stripe" {
  metadata {
    tags ["external", "pci-compliant"]
  }
  description "Payment processing service"
}

// Boundary crossing
Shop.API -> Stripe "Process payment"
Stripe -> Shop.API "Payment result"

Impact

  • Clear ownership: Who's responsible for what?
  • Risk management: What external dependencies exist?
  • Testing boundaries: What needs integration tests vs. unit tests?
  • Failure modes: What if the external service goes down?

Benefit 4: Flow Visualization

See how data and information move.

Example: Data Lineage

Without flows: Where does this data come from?

With flows in Sruja:

DataPipeline = flow "Order Analytics" {
  User -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Order data"
  Shop.API -> Shop.Database "Persist order"
  Shop.Database -> Analytics "Extract"
  Analytics -> DataWarehouse "Transform & load"
  DataWarehouse -> Dashboard "Visualize"
  Dashboard -> BusinessTeam "Make decisions"
}

Impact

  • Traceability: Where does data originate?
  • Compliance: What data flows where?
  • Performance: Where are bottlenecks?
  • Security: Where is sensitive data exposed?

Benefit 5: Valid Cycles

Feedback loops are natural, not errors.

Example: Real-Time Updates

Traditional view: Cycles are bad (circular dependency).

Systems thinking view: Cycles enable real-time behavior.

// Real-time collaboration
CollaborationSystem = system "Collaboration Tool" {
  Editor = container "Editor App"
  Sync = container "Sync Service"
  Database = database "Real-time DB"
}

// Valid feedback loop
Editor -> Sync "Local change"
Sync -> Database "Persist change"
Database -> Sync "Broadcast to others"
Sync -> Editor "Receive remote changes"
Editor -> Database "Acknowledge receipt"

Impact

  • Models real behavior: Systems often have natural cycles
  • Enables async patterns: Event-driven architectures
  • Simplifies mental model: Don't force acyclic where it doesn't belong
  • Better documentation: Shows the true system behavior

Practical Benefits

For Development

  • Better communication: Diagrams that everyone understands
  • Faster onboarding: New team members see the big picture quickly
  • Clear requirements: Scenarios define expected behavior
  • Easier testing: Flows guide test scenarios

For Operations

  • Better monitoring: Feedback loops show what to observe
  • Clearer incident response: Understand the failure path
  • Capacity planning: See data flows and bottlenecks
  • Change impact: What breaks if X changes?

For Business

  • Aligns with user journeys: Scenarios mirror real workflows
  • Clearer ownership: Boundaries show responsibility
  • Risk visibility: Dependencies and external factors visible
  • Better decisions: Full picture, not isolated features

When to Use Systems Thinking

Perfect for:

  • System design and architecture
  • Migration planning
  • Feature impact analysis
  • Root cause analysis
  • Cross-team coordination

Less critical for:

  • Simple CRUD applications
  • Individual feature implementation
  • Quick prototypes
  • Isolated bug fixes

Common Mistakes

Over-Engineering

Don't model everything. Start with the core system and add detail as needed.

Ignoring Context

Focusing only on technical components and missing stakeholders.

Missing Flows

Modeling components but not how data moves between them.

Forcing Cycles Out

Removing valid feedback loops because "cycles are bad."

Key Takeaways

  1. Systems thinking prevents common architecture problems
  2. Benefits span development, operations, and business
  3. Sruja provides concrete tools for each concept
  4. Use it appropriately - not everything needs full systems modeling
  5. Start simple, add detail incrementally

Exercise

Think of a recent architecture decision or problem you encountered. How would systems thinking have helped?

  • Could you have seen the full user journey?
  • Were there hidden dependencies?
  • Were external dependencies clear?
  • Did you understand the data flows?
  • Were there feedback loops you missed?

Module 1 Complete

You've completed the fundamentals module! You now understand:

  • What systems thinking is
  • The five core concepts
  • Why it matters for software architecture

Next: Dive into specific concepts starting with Module 2: Parts and Relationships.

Module 2: Parts and Relationships

Overview

In this module, you'll learn to identify and model the components (parts) of a system and how they interact (relationships).

Learning Objectives

By the end of this module, you'll be able to:

  • Identify the key parts of a software system
  • Model components using Sruja's element types
  • Define relationships with clear, meaningful labels
  • Use nesting to show hierarchical structure
  • Validate relationships for correctness

Lessons

Prerequisites

Time Investment

Approximately 1.5-2 hours to complete all lessons and exercises.

What's Next

After completing this module, you'll learn about Module 3: Boundaries.

Lesson 1: Identifying Parts

Learning Goals

  • Learn how to identify the key parts of a system
  • Understand the different types of components
  • Practice identifying parts from requirements

What Are Parts?

In systems thinking, "parts" are the distinct components that make up a system. In software architecture, these are the building blocks: users, systems, services, databases, queues, and more.

The C4 Model Hierarchy

Sruja follows the C4 model, which provides a clear hierarchy of parts:

Level 1: Person (Users, stakeholders)
  ↓
Level 2: System (Software systems)
  ↓
Level 3: Container (Applications, databases, services)
  ↓
Level 4: Component (Modules, classes, libraries)

Identifying Parts: Step by Step

Step 1: Start with People

Who interacts with the system?

Example Requirements:

"Customers can browse products, add to cart, and checkout. Administrators can manage inventory and view reports."

People identified:

  • Customer
  • Administrator
import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"
Administrator = person "Administrator"

Step 2: Identify Systems

What software systems are involved?

From requirements:

  • E-commerce platform (the main system)
  • Payment gateway (external)
  • Email service (external)
ECommerce = system "E-Commerce Platform"
PaymentGateway = system "Payment Gateway"
EmailService = system "Email Service"

Step 3: Break Down Systems into Containers

What applications, services, and databases make up each system?

For E-Commerce:

  • Web Application (React frontend)
  • API Service (Node.js backend)
  • Database (PostgreSQL)
  • Cache (Redis)
ECommerce = system "E-Commerce Platform" {
  WebApp = container "Web Application"
  API = container "API Service"
  DB = database "PostgreSQL"
  Cache = queue "Redis Cache"
}

Step 4: Break Down Containers into Components (Optional)

What modules or components make up each container?

For API Service:

  • Product Service
  • Cart Service
  • Order Service
  • Payment Service
API = container "API Service" {
  ProductService = component "Product Service"
  CartService = component "Cart Service"
  OrderService = component "Order Service"
  PaymentService = component "Payment Service"
}

Common Patterns

Pattern 1: Frontend-Backend-Database

App = system "Application" {
  Frontend = container "Web App" {
    technology "React"
  }
  Backend = container "API Service" {
    technology "Node.js"
  }
  Database = database "Database" {
    technology "PostgreSQL"
  }
}

Pattern 2: Microservices

App = system "Microservice Application" {
  APIGateway = container "API Gateway"
  UserService = container "User Service"
  OrderService = container "Order Service"
  NotificationService = container "Notification Service"
}

Pattern 3: Event-Driven

App = system "Event-Driven System" {
  Producer = container "Event Producer"
  Consumer = container "Event Consumer"
  MessageQueue = queue "Kafka Cluster"
}

Anti-Patterns to Avoid

Anti-Pattern 1: Oversimplification

Don't:

// Too simple, loses important detail
App = system "The App"

Do:

// Shows structure
App = system "The App" {
  Frontend = container "Frontend"
  Backend = container "Backend"
  Database = database "Database"
}

Anti-Pattern 2: Over-Engineering

Don't:

// Too detailed, hard to understand
App = system "The App" {
  Frontend = container "Frontend" {
    Header = component "Header"
    Body = component "Body"
    Footer = component "Footer"
  }
}

Do:

// Right level of detail
App = system "The App" {
  Frontend = container "Frontend"
  Backend = container "Backend"
}

Anti-Pattern 3: Mixing Levels

Don't:

// Inconsistent level of detail
App = system "The App" {
  Frontend = container "Frontend"
  UserService = component "User Service"  // Skips container level
  Database = database "Database"
}

Do:

// Consistent hierarchy
App = system "The App" {
  Frontend = container "Frontend"
  Backend = container "Backend" {
    UserService = component "User Service"
  }
  Database = database "Database"
}

Exercise: Identify Parts

Read these requirements and identify the parts:

"A project management tool allows team members to create tasks, assign them to others, and track progress. Managers can view reports and approve tasks. The system sends email notifications for task assignments and due dates. Task data is stored in a database. The system integrates with Slack for notifications."

Identify:

  1. People: _
  2. Systems: _
  3. Containers: _

Key Takeaways

  1. Start with people: Who uses the system?
  2. Use the C4 hierarchy: Person → System → Container → Component
  3. Match detail to audience: Not every diagram needs components
  4. Avoid anti-patterns: Don't oversimplify or over-engineer
  5. Be consistent: Use the same level of detail throughout

Next Lesson

In Lesson 2, you'll learn how to use Sruja's element types to model the parts you've identified.

Lesson 2: Sruja Elements

Learning Goals

  • Learn Sruja's core element types
  • Understand when to use each type
  • Model parts with proper Sruja syntax

Sruja Element Types

Sruja provides element types that map to the C4 model:

Element TypePurposeC4 Level
personUsers, stakeholdersLevel 1
systemSoftware systems, servicesLevel 2
containerApplications, databases, queuesLevel 3
componentModules, services within containersLevel 4

Person

Use person to represent humans who interact with your system.

Basic Syntax

person = kind "Person"

User = person "End User"
Admin = person "Administrator"
Support = person "Customer Support"

With Details

Customer = person "Customer" {
  description "End users who purchase products"
  metadata {
    type ["external"]
    priority "high"
  }
}

When to Use

  • End users (customers, employees)
  • Stakeholders (managers, business owners)
  • Support teams
  • External users (API consumers)

Examples

// Internal users
Developer = person "Developer"
ProductManager = person "Product Manager"

// External users
APIConsumer = person "API Consumer"
Partner = person "Business Partner"

System

Use system to represent standalone software systems.

Basic Syntax

system = kind "System"

Shop = system "E-Commerce Platform"
APIGateway = system "API Gateway"
PaymentGateway = system "Payment Gateway"

With Details

ECommerce = system "E-Commerce Platform" {
  description "Platform for buying and selling products"
  metadata {
    version "2.0"
    team ["platform-team"]
  }
  slo {
    availability {
      target "99.9%"
    }
  }
}

When to Use

  • Main application you're building
  • External systems you depend on
  • Third-party services
  • Separate software products

Examples

// Your systems
Platform = system "Sruja Platform"
Dashboard = system "Analytics Dashboard"

// External systems
Stripe = system "Stripe"
AWS = system "Amazon Web Services"
GitHub = system "GitHub"

Container

Use container to represent applications, databases, and other deployable units within a system.

Basic Syntax

container = kind "Container"

WebApp = container "Web Application"
API = container "API Service"
DB = database "Database"
Queue = queue "Message Queue"

With Details

WebApp = container "Web Application" {
  technology "React"
  description "Single-page application"
  version "3.1.0"
  tags ["frontend", "typescript"]
  scale {
    min 2
    max 10
  }
  slo {
    latency {
      p95 "200ms"
    }
  }
}

When to Use

  • Web applications (React, Vue, Angular)
  • API services (Node.js, Rust, Python)
  • Databases (PostgreSQL, MongoDB)
  • Message queues (Kafka, RabbitMQ)
  • Caches (Redis, Memcached)

Database vs Datastore

// Database (relational)
UserDB = database "User Database" {
  technology "PostgreSQL"
}

// Datastore (document)
CacheDB = datastore "Redis Cache" {
  technology "Redis"
}

Component

Use component to represent modules or services within a container.

Basic Syntax

component = kind "Component"

AuthService = component "Authentication Service"
UserService = component "User Service"
OrderController = component "Order Controller"

With Details

AuthService = component "Authentication Service" {
  technology "Rust"
  description "Handles user login and registration"
  scale {
    min 1
    max 5
  }
}

When to Use

  • Service modules within a monolith
  • Controllers in MVC architecture
  • Domain services
  • Library components
  • Utilities and helpers

Complete Example

import { * } from 'sruja.ai/stdlib'

// People (Level 1)
Customer = person "Customer"
Administrator = person "Administrator"

// Systems (Level 2)
ECommerce = system "E-Commerce Platform"
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

// Containers (Level 3)
ECommerce = system "E-Commerce Platform" {
  WebApp = container "Web Application" {
    technology "React"
  }

  API = container "API Service" {
    technology "Node.js"
    scale {
      min 3
      max 10
    }

    // Components (Level 4)
    ProductService = component "Product Service"
    CartService = component "Cart Service"
    OrderService = component "Order Service"
    PaymentService = component "Payment Service"
  }

  Database = database "PostgreSQL" {
    technology "PostgreSQL 14"
  }

  Cache = queue "Redis" {
    technology "Redis 7"
  }
}

Naming Conventions

Element IDs

Use PascalCase for element IDs:

Good:
Customer = person "Customer"
WebApp = container "Web App"

Bad:
customer = person "Customer"  // SnakeCase
WEBAPP = container "Web App"  // ALL CAPS

Display Names

Use descriptive names with proper capitalization:

Good:
User = person "End User"
API = container "API Service"

Bad:
User = person "user"  // Lowercase
API = container "api"  // Lowercase

Consistency

Be consistent across your architecture:

Good:
UserService = component "User Service"
OrderService = component "Order Service"
PaymentService = component "Payment Service"

Bad:
UserService = component "User Service"
Order = component "Order"
Payment = component "Payment Service"

Adding Metadata

Use metadata to add context:

API = container "API Service" {
  technology "Rust"
  version "2.0.1"
  tags ["backend", "api"]
  metadata {
    team ["platform-team"]
    repository "github.com/company/api"
  }
}

Exercise

Model the following using Sruja elements:

"A blog platform has readers and writers. The main system contains a web frontend (Next.js), API backend (Python/FastAPI), and database (PostgreSQL). The API has user management, post management, and comment services."

Create elements for:

  • People
  • Systems
  • Containers
  • Components

Key Takeaways

  1. Four core element types: person, system, container, component
  2. Match type to C4 level: Each has a specific purpose
  3. Add details: technology, description, tags, scale, SLOs
  4. Follow naming conventions: Be consistent
  5. Use metadata: Add context for your team

Next Lesson

In Lesson 3, you'll learn how to connect parts with relationships.

Lesson 3: Defining Relationships

Learning Goals

  • Understand how parts connect
  • Learn Sruja relationship syntax
  • Write clear, meaningful relationship labels
  • Model different types of interactions

What Are Relationships?

Relationships describe how parts interact. They show:

  • Communication between components
  • Data flow
  • Dependencies
  • User actions

Basic Syntax

From -> To "Label"
  • From: Source element
  • To: Destination element
  • "Label": What the relationship represents (required)

Examples

Person to System

Customer -> Shop "Uses"
Administrator -> Shop "Manages"

Container to Container

Shop.WebApp -> Shop.API "Makes API calls"
Shop.API -> Shop.Database "Reads and writes"

Component to Component

Shop.API.ProductService -> Shop.API.CartService "Gets product details"
Shop.API.CartService -> Shop.API.OrderService "Creates order"

Relationship Labels

Labels should be clear, concise, and meaningful.

Good Labels

Customer -> Shop.WebApp "Browses products"
Shop.API -> Shop.Database "Queries data"
Shop.API -> PaymentGateway "Processes payment"

Bad Labels

Customer -> Shop.WebApp "Uses"  // Too generic
Shop.API -> Shop.Database "Connects"  // Doesn't describe what happens
Shop.API -> PaymentGateway "Integration"  // Technical, not behavioral

Label Guidelines

  • Use present tense verbs: "uses", "reads", "calls"
  • Be specific: "Processes payment" vs "uses"
  • Show direction: Clear who initiates the action
  • Keep it short: 2-5 words is ideal

Relationship Patterns

Pattern 1: User Interactions

User -> WebApp "Logs in"
User -> WebApp "Views products"
User -> WebApp "Adds to cart"
User -> WebApp "Checks out"

Pattern 2: Service Communication

WebApp -> API "Sends requests"
API -> Database "Persists data"
API -> Cache "Reads cache"
Cache -> API "Returns cached data"

Pattern 3: External Dependencies

API -> PaymentGateway "Process payment"
API -> EmailService "Send notifications"
API -> AnalyticsService "Track events"

Nested Element References

Use dot notation to reference nested elements:

// Direct child reference
Customer -> Shop.WebApp "Uses"

// Nested component reference
Shop.API.ProductService -> Shop.API.CartService "Get product info"

// Cross-system reference
Shop.API -> PaymentGateway.ChargeService "Process payment"

Relationship Tags

Add tags to categorize relationships:

From -> To "Label" [tag1, tag2]

// Example
Shop.API -> PaymentGateway "Process payment" [critical, external]
Shop.WebApp -> Shop.API "API call" [http]

Common Tags

// Protocol
[http], [grpc], [websocket]

// Importance
[critical], [optional], [best-effort]

// Data type
[synchronous], [asynchronous], [streaming]

// Security
[encrypted], [authenticated], [public]

// Scope
[internal], [external], [partner]

Multiple Relationships

Elements can have multiple relationships:

// WebApp has multiple relationships
Customer -> Shop.WebApp "Browses"
Shop.WebApp -> Shop.API "Queries products"
Shop.WebApp -> Shop.Cache "Reads cache"
Shop.WebApp -> Customer "Displays products"

// API has multiple relationships
Shop.WebApp -> Shop.API "Sends request"
Shop.API -> Shop.Database "Persists order"
Shop.API -> PaymentGateway "Process payment"
Shop.API -> Shop.WebApp "Returns response"

Relationship Direction

One-Way

Customer -> Shop.WebApp "Uses"

Two-Way (Two separate relationships)

User -> App.API "Submits data"
App.API -> User "Returns result"

Feedback Loop (Cycle)

User -> App.WebApp "Submits form"
App.WebApp -> App.API "Validates"
App.API -> App.WebApp "Returns errors"
App.WebApp -> User "Shows errors"
// User resubmits (loop completes)

Relationships in Views

Relationships are automatically included in views:

view index {
  include *
}

view container_view of Shop {
  include Shop.*
}

Complete Example

import { * } from 'sruja.ai/stdlib'

// Elements
Customer = person "Customer"
Admin = person "Administrator"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  DB = database "Database"
}

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

// Relationships
Customer -> Shop.WebApp "Browses products" [public]
Customer -> Shop.WebApp "Adds to cart" [authenticated]
Customer -> Shop.WebApp "Checks out" [authenticated]

Shop.WebApp -> Shop.API "Sends requests" [http, encrypted]
Shop.API -> Shop.DB "Persists data" [critical]
Shop.API -> Shop.DB "Queries data" [cached]

Shop.API -> PaymentGateway "Process payment" [critical, external]

Admin -> Shop.WebApp "Manages products" [authenticated]
Admin -> Shop.WebApp "Views reports" [authenticated]

view index {
  include *
}

Exercise

Add relationships to this architecture:

import { * } from 'sruja.ai/stdlib'

Reader = person "Reader"
Writer = person "Writer"

Blog = system "Blog Platform" {
  Frontend = container "Web App"
  Backend = container "API"
  Database = database "PostgreSQL"
}

EmailService = system "Email Service" {
  tags ["external"]
}

Define relationships for:

  • Reader interactions
  • Writer interactions
  • Backend-Database communication
  • Backend-EmailService integration

Key Takeaways

  1. Relationships show interactions: How parts communicate
  2. Labels must be meaningful: Clear, specific verbs
  3. Use dot notation: Reference nested elements
  4. Add tags: Categorize relationships
  5. Multiple relationships: Elements can have many connections

Next Lesson

In Lesson 4, you'll learn how to organize parts using hierarchy and nesting.

Lesson 4: Hierarchy and Nesting

Learning Goals

  • Understand how to organize parts hierarchically
  • Learn nesting patterns for containers and components
  • Model complex systems with clear structure
  • Balance detail and clarity

The C4 Hierarchy

Sruja follows the C4 model hierarchy:

Person (Level 1)
  ↓
System (Level 2)
  ↓
Container (Level 3)
  ↓
Component (Level 4)

Nesting Containers in Systems

Systems contain containers:

import { * } from 'sruja.ai/stdlib'

Shop = system "E-Commerce Platform" {
  WebApp = container "Web Application" {
    technology "React"
  }

  API = container "API Service" {
    technology "Node.js"
  }

  DB = database "PostgreSQL" {
    technology "PostgreSQL 14"
  }
}

Benefits of Nesting

  • Clear ownership: Containers belong to systems
  • Logical grouping: Related components together
  • Simpler references: Use dot notation
  • Better organization: Easier to navigate

Nesting Components in Containers

Containers contain components:

API = container "API Service" {
  AuthService = component "Authentication Service"
  ProductService = component "Product Service"
  CartService = component "Cart Service"
  OrderService = component "Order Service"
}

When to Add Components

Add components when:

  • Container is complex (>5 logical parts)
  • Different teams own different parts
  • Need to show internal architecture
  • Documenting for implementation

Skip components when:

  • Container is simple (monolith)
  • Too much detail for audience
  • Container level is sufficient

Reference Patterns

Level 1 to Level 2 (Person to System)

Customer -> Shop "Uses"

Level 2 to Level 3 (System to Container)

Use dot notation:

Customer -> Shop.WebApp "Browses"
Customer -> Shop.API "Submits order"

Level 3 to Level 3 (Container to Container)

Shop.WebApp -> Shop.API "Sends requests"
Shop.API -> Shop.DB "Persists data"

Level 3 to Level 4 (Container to Component)

Shop.API -> Shop.API.ProductService "Get products"
Shop.API -> Shop.API.OrderService "Create order"

Level 4 to Level 4 (Component to Component)

Shop.API.ProductService -> Shop.API.CartService "Get product details"
Shop.API.CartService -> Shop.API.OrderService "Create order"

Multi-Level Systems

Complex Example

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

ECommerce = system "E-Commerce Platform" {
  WebApp = container "Web Application" {
    technology "React"
  }

  API = container "API Service" {
    technology "Node.js"

    ProductService = component "Product Service"
    CartService = component "Cart Service"
    OrderService = component "Order Service"
    PaymentService = component "Payment Service"
    NotificationService = component "Notification Service"
  }

  Database = database "PostgreSQL" {
    technology "PostgreSQL 14"
  }

  Cache = queue "Redis" {
    technology "Redis 7"
  }
}

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

EmailService = system "Email Service" {
  metadata {
    tags ["external"]
  }
}

// Relationships at different levels
Customer -> ECommerce.WebApp "Browses"
ECommerce.WebApp -> ECommerce.API "Makes requests"
ECommerce.API.ProductService -> ECommerce.Cache "Reads cache"
ECommerce.API.PaymentService -> PaymentGateway "Process payment"
ECommerce.API.NotificationService -> EmailService "Send notifications"

Hierarchy Best Practices

1. Be Consistent

Don't mix levels:

// Bad: Inconsistent nesting
App = system "App" {
  Frontend = container "Frontend"
  Backend = container "Backend" {
    API = component "API Service"
  }
  Database = database "Database"
}

// Good: Consistent nesting
App = system "App" {
  Frontend = container "Frontend" {
    UI = component "UI Components"
  }
  Backend = container "Backend" {
    API = component "API Service"
  }
  Database = database "Database"
}

2. Right Level of Detail

Match detail to audience:

// For business stakeholders
Platform = system "Platform" {
  WebApp = container "Web App"
  API = container "API"
}

// For developers
Platform = system "Platform" {
  WebApp = container "Web App" {
    Auth = component "Auth Module"
    Dashboard = component "Dashboard"
  }
  API = container "API" {
    UserService = component "User Service"
    ProductService = component "Product Service"
  }
}

3. Logical Grouping

Group related components:

// Good: Logical grouping
API = container "API" {
  UserServices = component "User Service"
  ProductServices = component "Product Service"
  OrderServices = component "Order Service"
}

// Also Good: Domain-based grouping
API = container "API" {
  UserDomain = component "User Module"
  ProductDomain = component "Product Module"
  OrderDomain = component "Order Module"
}

Implied Relationships

Sruja automatically infers some relationships:

Customer -> Shop.WebApp "Uses"
// Implies: Customer -> Shop

This reduces boilerplate while maintaining accuracy.

Views and Hierarchy

Create views at different hierarchy levels:

// Level 1 view (System Context)
view index {
  title "System Context"
  include *
}

// Level 2 view (System)
view system_view of Shop {
  title "Shop System"
  include Shop
}

// Level 3 view (Containers)
view container_view of Shop {
  title "Shop Containers"
  include Shop.*
  exclude Shop.Database
}

// Level 4 view (Components)
view component_view of Shop.API {
  title "API Components"
  include Shop.API.*
}

Anti-Patterns to Avoid

Anti-Pattern 1: Deep Nesting

// Bad: Too deep (hard to read)
App = system "App" {
  Frontend = container "Frontend" {
    Layout = component "Layout" {
      Header = component "Header" {
        Navigation = component "Navigation"
      }
    }
  }
}

Solution: Keep nesting to 3-4 levels max.

Anti-Pattern 2: Orphaned Elements

// Bad: Component without container
Shop = system "Shop"
WebApp = container "Web App"
API = component "API Service"  // Should be in a container

Solution: Nest components properly.

Anti-Pattern 3: Flat Everything

// Bad: Everything at same level
Customer = person "Customer"
WebApp = container "Web App"
API = container "API"
Database = database "Database"
AuthService = component "Auth Service"

Solution: Show proper hierarchy.

Exercise

Create a nested architecture for:

"A social media platform with users, posts, and comments. The platform has a web frontend, mobile API, and backend services. The backend has user management, post management, and notification services. Data is stored in PostgreSQL and cached in Redis."

Create:

  • Person level
  • System level with nested containers
  • Component level for at least one container

Key Takeaways

  1. Follow C4 hierarchy: Person → System → Container → Component
  2. Nest logically: Group related parts together
  3. Use dot notation: Reference nested elements
  4. Right level of detail: Match audience needs
  5. Create multiple views: Show different hierarchy levels

Module 2 Complete

You've completed Parts and Relationships! You now understand:

  • How to identify system parts
  • Sruja's element types
  • Defining meaningful relationships
  • Organizing parts hierarchically

Next: Learn about Module 3: Boundaries.

Module 3: Boundaries

Overview

In this module, you'll learn to define what's inside your system vs. what's outside (the environment). Understanding boundaries is crucial for clear ownership, risk management, and integration planning.

Learning Objectives

By the end of this module, you'll be able to:

  • Define system boundaries clearly
  • Model internal vs. external components
  • Identify and document dependencies
  • Create bounded contexts for services
  • Plan integrations at boundaries

Lessons

Prerequisites

Time Investment

Approximately 1-1.5 hours to complete all lessons and exercises.

What's Next

After completing this module, you'll learn about Module 4: Flows.

Lesson 1: Understanding Boundaries

Learning Goals

  • Understand what boundaries are in software architecture
  • Recognize different types of boundaries
  • Learn why boundaries matter

What Are Boundaries?

A boundary is the line that separates what's inside your system (what you build, own, and maintain) from what's outside (the environment, external dependencies, and stakeholders).

┌─────────────────────────────────────┐
│          EXTERNAL                   │
│  (Environment, Dependencies)        │
│                                     │
│    ┌───────────────────────────┐    │
│    │     INTERNAL             │    │
│    │   (Your System)          │    │
│    │                           │    │
│    │  [Components]             │    │
│    │                           │    │
│    └───────────────────────────┘    │
│             ↑                         │
│     Boundary Line                    │
└─────────────────────────────────────┘

Why Boundaries Matter

1. Clear Ownership

Who's responsible for what?

// Inside: Your team owns this
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API Service"
}

// Outside: Another team/service owns this
PaymentGateway = system "Payment Gateway"

2. Risk Management

What external risks exist?

// External dependency = external risk
Shop.API -> PaymentGateway "Process payment"

// If PaymentGateway is down, what happens?
// What's your fallback? SLA?

3. Testing Scope

What needs integration tests vs. unit tests?

// Internal: Unit tests sufficient
Shop.WebApp -> Shop.API

// External: Integration tests needed
Shop.API -> PaymentGateway

4. Security

What needs protection?

// Inside boundary: Apply security controls
Shop.WebApp -> Shop.API

// Crossing boundary: Validate, authenticate
Shop.API -> PaymentGateway

Types of Boundaries

1. System Boundary

Your main application vs. the world:

// Inside
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

// Outside
Customer = person "Customer"

2. Team Boundary

What your team owns vs. what other teams own:

// Your team's system
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

// Another team's system
Analytics = system "Analytics Platform"

3. Organizational Boundary

Internal vs. external organizations:

// Your company's system
Shop = system "Shop"

// External vendor
Stripe = system "Stripe" {
  metadata {
    tags ["external", "vendor"]
  }
}

4. Deployment Boundary

What's deployed together:

// Same deployment
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

// Separate deployment
Database = system "Database Cluster"

5. Trust Boundary

Security and trust levels:

// Trusted: Internal network
InternalAPI = container "Internal API"

// Untrusted: Public internet
PublicAPI = container "Public API"

Boundary Examples

Example 1: E-Commerce Platform

// ┌──────────── EXTERNAL ────────────┐
// │                                  │
// │   Customer (person)               │
// │   Payment Gateway (system)        │
// │   Email Service (system)          │
// │                                  │
// │   ┌────── INTERNAL ─────────┐    │
// │   │ Shop (system)           │    │
// │   │   WebApp (container)    │    │
// │   │   API (container)       │    │
// │   │   Database (database)   │    │
// │   └─────────────────────────┘    │
// │                                  │
// └──────────────────────────────────┘

Customer -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> PaymentGateway "Process payment"

Example 2: Microservices Architecture

// Internal boundaries (within organization)
OrderService = system "Order Service"
PaymentService = system "Payment Service"
InventoryService = system "Inventory Service"

OrderService -> PaymentService "Request payment"
PaymentService -> OrderService "Payment result"
OrderService -> InventoryService "Reserve items"

Boundary Anti-Patterns

Anti-Pattern 1: No Clear Boundary

// Bad: Everything looks internal
Customer = person "Customer"
Shop = system "Shop"
PaymentGateway = system "Payment Gateway"
EmailService = system "Email Service"

// Without tags, it's unclear what's external

Solution: Use metadata tags to mark external systems.

Anti-Pattern 2: Everything External

// Bad: Everything marked external, no ownership
Shop = system "Shop" {
  tags ["external"]
}
WebApp = container "Web App" {
  tags ["external"]
}

Solution: Mark only truly external systems.

Anti-Pattern 3: Too Many Boundaries

// Bad: Overly fragmented, hard to understand
System1 = system "System 1"
System2 = system "System 2"
System3 = system "System 3"
// ... many small systems

Solution: Group related functionality.

Defining Boundaries in Sruja

Use system for Main Boundary

Shop = system "Shop" {
  // Internal containers
  WebApp = container "Web App"
  API = container "API"
}

Use Metadata for External Systems

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "vendor"]
    owner "Third-party"
  }
}

Use person for External Actors

// Users are outside the system boundary
Customer = person "Customer"
Administrator = person "Administrator"

Exercise

Identify the boundaries in this scenario:

"A hospital scheduling system allows patients to book appointments, doctors to manage their schedules, and administrators to oversee operations. The system integrates with an external insurance API for coverage verification and sends SMS notifications through Twilio. Patient data is stored in the hospital's database."

Identify:

  1. Internal system: _
  2. External systems: _
  3. External actors: _
  4. Boundary crossings: _

Key Takeaways

  1. Boundaries define ownership: What you build vs. what you depend on
  2. Multiple boundary types: System, team, organization, deployment, trust
  3. Mark external systems: Use metadata tags
  4. Document crossings: Show how boundaries are crossed
  5. Avoid anti-patterns: Clear but not over-fragmented boundaries

Next Lesson

In Lesson 2, you'll learn how to differentiate and mark internal vs. external components.

Lesson 2: Internal vs External

Learning Goals

  • Learn how to mark components as internal or external
  • Use Sruja metadata to annotate boundary elements
  • Model team and organizational boundaries

Marking External Systems

Basic Pattern

// Internal: Your system
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API Service"
}

// External: Third-party system
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

Metadata for External Systems

External Tag

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

External with Ownership

Stripe = system "Stripe" {
  metadata {
    tags ["external", "vendor"]
    owner "Stripe Inc."
  }
}

External with SLA

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    sla "99.9% uptime"
  }
}

External with Compliance

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "pci-compliant"]
    compliance ["PCI-DSS Level 1"]
  }
}

Internal vs External Patterns

Pattern 1: Third-Party Services

// Your system
Shop = system "Shop"

// Third-party integrations
Stripe = system "Stripe" {
  metadata {
    tags ["external", "vendor", "pci-compliant"]
    owner "Stripe"
  }
}

Twilio = system "Twilio" {
  metadata {
    tags ["external", "vendor"]
    owner "Twilio"
  }
}

GoogleAnalytics = system "Google Analytics" {
  metadata {
    tags ["external", "vendor"]
    owner "Google"
  }
}

Pattern 2: Partner Integrations

// Your system
Shop = system "Shop"

// Partner systems
LogisticsPartner = system "Logistics Partner API" {
  metadata {
    tags ["external", "partner"]
    owner "FedEx"
  }
}

InventoryPartner = system "Inventory Partner" {
  metadata {
    tags ["external", "partner"]
    owner "Vendor X"
  }
}

Pattern 3: Internal Team Boundaries

// Your team's system
Shop = system "Shop"

// Another team's systems
UserPlatform = system "User Platform" {
  metadata {
    tags ["internal", "platform-team"]
    owner "Platform Team"
  }
}

AnalyticsPlatform = system "Analytics Platform" {
  metadata {
    tags ["internal", "data-team"]
    owner "Data Team"
  }
}

People: Always External

People are always outside the system boundary:

// External actors
Customer = person "Customer"
Administrator = person "Administrator"
Support = person "Customer Support"

// Your system
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API Service"
}

// People interact with internal components
Customer -> Shop.WebApp "Browses"
Administrator -> Shop.WebApp "Manages"
Support -> Shop.WebApp "Monitors"

Boundary Crossings

Crossing to External Systems

// External dependency
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

// Crossing the boundary
Shop.API -> PaymentGateway "Process payment"
PaymentGateway -> Shop.API "Payment result"

Multiple Boundary Crossings

Customer = person "Customer"

// Internal
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

// External 1
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

// External 2
EmailService = system "Email Service" {
  metadata {
    tags ["external"]
  }
}

// External 3
AnalyticsService = system "Analytics Service" {
  metadata {
    tags ["external"]
  }
}

// Multiple crossings
Customer -> Shop.WebApp "Places order"
Shop.WebApp -> Shop.API "Process order"
Shop.API -> PaymentGateway "Charge payment"
Shop.API -> EmailService "Send confirmation"
Shop.API -> AnalyticsService "Track event"

Teams and Boundaries

Single Team, One System

// Your team owns everything
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  Database = database "Database"
}

Multiple Teams, Bounded Contexts

// Team A: Shop team
Shop = system "Shop" {
  metadata {
    tags ["internal", "shop-team"]
    owner "Shop Team"
  }
  WebApp = container "Web App"
  API = container "API"
}

// Team B: Payment team
Payment = system "Payment Service" {
  metadata {
    tags ["internal", "payment-team"]
    owner "Payment Team"
  }
  Processor = container "Payment Processor"
}

// Team C: Notification team
Notifications = system "Notification Service" {
  metadata {
    tags ["internal", "notification-team"]
    owner "Notification Team"
  }
  Sender = container "Notification Sender"
}

// Cross-team boundaries
Shop.API -> Payment.Processor "Process payment"
Shop.API -> Notifications.Sender "Send notification"

Complete Example

import { * } from 'sruja.ai/stdlib'

// External actors (always external)
Customer = person "Customer"
Administrator = person "Administrator"

// Internal system
Shop = system "Shop" {
  metadata {
    tags ["internal"]
    owner "Shop Team"
  }

  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "PostgreSQL"
}

// External systems
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "vendor", "pci-compliant"]
    owner "Stripe"
    sla "99.9% uptime"
  }
}

EmailService = system "Email Service" {
  metadata {
    tags ["external", "vendor"]
    owner "SendGrid"
  }
}

AnalyticsService = system "Analytics Service" {
  metadata {
    tags ["external", "vendor"]
    owner "Google"
  }
}

// Relationships
Customer -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Sends requests"
Shop.API -> Shop.Database "Persists data"

// Crossing boundaries
Shop.API -> PaymentGateway "Process payment"
PaymentGateway -> Shop.API "Payment result"
Shop.API -> EmailService "Send notifications"
Shop.API -> AnalyticsService "Track events"

view index {
  include *
}

Boundary Views

System Context View (Shows boundaries)

view index {
  title "System Context - Shows Internal vs External"
  include *
}

Internal-Only View

view internal_view of Shop {
  title "Internal Architecture"
  include Shop.*
  exclude external
}

Exercise

Mark internal and external components:

import { * } from 'sruja.ai/stdlib'

// Add metadata to mark external systems

Patient = person "Patient"
Doctor = person "Doctor"

Hospital = system "Hospital Scheduling" {
  WebApp = container "Web App"
  API = container "API Service"
  Database = database "Database"
}

InsuranceAPI = system "Insurance API"
SMSProvider = system "SMS Provider"

Add metadata to mark:

  • External systems
  • External dependencies
  • Owner information

Key Takeaways

  1. Use metadata tags: Mark external systems clearly
  2. People are external: Users are outside the system
  3. Document ownership: Who owns each system
  4. Show crossings: How boundaries are crossed
  5. Team boundaries: Internal boundaries between teams

Next Lesson

In Lesson 3, you'll learn how to model integrations and plan for boundary crossings.

Lesson 3: Crossing Boundaries

Learning Goals

  • Model integrations across boundaries
  • Plan for failures at boundaries
  • Document interface contracts
  • Design fallback strategies

Boundary Crossings

Every time a relationship crosses from internal to external, it's a boundary crossing:

// Internal → External = Boundary crossing
Shop.API -> PaymentGateway "Process payment"

// External → Internal = Boundary crossing
PaymentGateway -> Shop.API "Payment result"

Integration Patterns

Pattern 1: Request-Response

Shop.API -> PaymentGateway "Process payment"
PaymentGateway -> Shop.API "Payment result"

// Characteristics:
// - Synchronous
// - Real-time
// - Tight coupling

Pattern 2: Event-Driven

Shop.API -> EventQueue "Publish order event"
EventQueue -> PaymentProcessor "Consume order event"

// Characteristics:
// - Asynchronous
// - Decoupled
// - Resilient

Pattern 3: Polling

Shop.API -> ExternalAPI "Check status"
ExternalAPI -> Shop.API "Return status"

// Characteristics:
// - Periodic checks
// - No webhooks
// - Simpler but less efficient

Integration Considerations

1. Error Handling

What happens when external service fails?

// Document expected failure modes
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    sla "99.9% uptime"
    failure_modes ["timeout", "service unavailable", "network error"]
  }
}

// Model fallbacks
Shop.API -> PaymentGateway "Process payment" [primary]
Shop.API -> PaymentGatewayBackup "Process payment" [fallback]

2. Timeouts and Latency

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    timeout "30s"
    expected_latency "500ms"
    max_latency "5s"
  }
}

Shop.API = container "API Service" {
  slo {
    latency {
      p95 "200ms"
      p99 "500ms"
    }
  }
}

3. Data Consistency

// Document consistency guarantees
Shop.API -> PaymentGateway "Process payment"
// If payment succeeds but order save fails:
// - Idempotent payment calls
// - Compensating transactions
// - Eventual consistency

4. Security

// Security at boundary
Shop.API -> PaymentGateway "Process payment" [encrypted, authenticated, tls1.3]

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "pci-compliant"]
    security ["mutual TLS", "API key authentication"]
  }
}

Documenting Interface Contracts

API Contract

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    api_endpoint "https://api.payment.com/v1"
    authentication "API Key"
    rate_limit "1000 req/min"
  }
}

Shop.API = container "API Service" {
  metadata {
    api_consumer "Payment Gateway Client"
    retry_policy "3 retries with exponential backoff"
  }
}

Data Format

PaymentGateway = system "Payment Gateway" {
  metadata {
    data_format "JSON"
    schema_version "v1.2"
    validation "Strict schema validation"
  }
}

SLA and Reliability

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    sla "99.9% uptime"
    mttr "4 hours"
    support "24/7 enterprise support"
  }
}

Fallback Strategies

Strategy 1: Redundant Providers

// Primary provider
PrimaryPayment = system "Primary Payment Gateway" {
  metadata {
    tags ["external", "primary"]
  }
}

// Backup provider
BackupPayment = system "Backup Payment Gateway" {
  metadata {
    tags ["external", "backup"]
  }
}

// Try primary, fall back to backup
Shop.API -> PrimaryPayment "Process payment" [primary]
Shop.API -> BackupPayment "Process payment" [fallback]

Strategy 2: Circuit Breaker

Shop.API = container "API Service" {
  metadata {
    circuit_breaker {
      enabled true
      failure_threshold "5"
      recovery_timeout "60s"
    }
  }
}

Strategy 3: Degraded Mode

// If external analytics fails, continue operation
Shop.API -> AnalyticsService "Track events" [non_critical]

// If email fails, queue for later
Shop.API -> EmailService "Send notifications" [async_queue]

Strategy 4: Cache External Data

// Cache external API responses
Shop.API -> ExternalAPI "Get exchange rates"
Shop.Cache -> Shop.API "Return cached rates"

// Fallback to cache if external fails
Shop.API -> Shop.Cache "Get cached rates" [fallback]

Complete Integration Example

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service" {
    metadata {
      timeout "30s"
      retry_policy "3 retries with exponential backoff"
      circuit_breaker {
        enabled true
        failure_threshold 5
        recovery_timeout "60s"
      }
    }
  }
  Cache = database "Redis Cache"
}

// Primary payment provider
Stripe = system "Stripe" {
  metadata {
    tags ["external", "primary", "pci-compliant"]
    owner "Stripe Inc."
    sla "99.99% uptime"
    api_endpoint "https://api.stripe.com/v1"
    authentication "API Key"
    rate_limit "1000 req/min"
    data_format "JSON"
  }
}

// Backup payment provider
PayPal = system "PayPal" {
  metadata {
    tags ["external", "backup"]
    owner "PayPal"
    sla "99.9% uptime"
  }
}

// Email service
SendGrid = system "SendGrid" {
  metadata {
    tags ["external"]
    sla "99.9% uptime"
    timeout "10s"
  }
}

// Integrations
Customer -> Shop.WebApp "Checkout"
Shop.WebApp -> Shop.API "Process order"

// Primary payment integration (encrypted, authenticated)
Shop.API -> Stripe "Process payment" [primary, encrypted, tls1.3]
Stripe -> Shop.API "Payment result"

// Fallback to backup
Shop.API -> PayPal "Process payment" [fallback, encrypted]

// Email (non-critical, can queue)
Shop.API -> SendGrid "Send confirmation" [non_critical, async_queue]

// Cache external API calls
Shop.Cache -> Shop.API "Return cached data"

view index {
  include *
}

Boundary Testing

What to Test at Boundaries

// Document test requirements
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
    tests [
      "Happy path integration test",
      "Timeout handling",
      "Error response handling",
      "Rate limiting",
      "Authentication failure"
    ]
  }
}

Exercise

Design an integration for:

"A weather application displays current weather and forecasts. The app fetches weather data from OpenWeatherMap API and sends location-based ads to users. The ad service is provided by an external vendor."

Consider:

  1. Boundary crossings
  2. Error handling
  3. Caching strategy
  4. Fallbacks (if any)

Key Takeaways

  1. Every boundary crossing is an integration point
  2. Document interface contracts: API endpoints, data formats, SLAs
  3. Plan for failures: Timeouts, errors, service outages
  4. Design fallbacks: Redundant providers, degraded modes, caching
  5. Test boundaries: Integration tests, error scenarios

Module 3 Complete

You've completed Boundaries! You now understand:

  • What boundaries are and why they matter
  • How to mark internal vs. external components
  • How to model integrations and plan for boundary crossings

Next: Learn about Module 4: Flows.

Module 4: Flows

Overview

In this module, you'll learn to model how information, data, and actions move through your system. Flows help you understand data lineage, process sequences, and bottlenecks.

Learning Objectives

By the end of this module, you'll be able to:

  • Model data flows using Sruja scenarios
  • Document user journeys and workflows
  • Identify bottlenecks and performance issues
  • Differentiate between data flows and behavioral flows

Lessons

Prerequisites

Time Investment

Approximately 1.5-2 hours to complete all lessons and exercises.

What's Next

After completing this module, you'll learn about Module 5: Feedback Loops.

Lesson 1: Understanding Flows

Learning Goals

  • Understand what flows are in systems thinking
  • Learn when to use flows vs. static relationships
  • Recognize different types of flows

What Are Flows?

Flows show how information, data, and actions move through a system. Unlike static relationships (which show connections), flows show sequences and transformations.

Static Relationship vs. Flow

// Static relationship: Shows connection
Customer -> Shop.WebApp "Uses"

// Flow: Shows sequence
CheckoutFlow = scenario "Checkout" {
  Customer -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Sends order data"
  Shop.API -> Shop.Database "Saves order"
}

Why Flows Matter

1. Data Lineage

Where does data come from and where does it go?

DataFlow = flow "Order Analytics" {
  Customer -> Shop.WebApp "Order data"
  Shop.WebApp -> Shop.API "API request"
  Shop.API -> Shop.Database "Persist"
  Shop.Database -> Analytics "Extract"
  Analytics -> Dashboard "Visualize"
}

2. Process Understanding

What's the sequence of actions?

OrderProcess = scenario "Order Processing" {
  Customer -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Process order"
  Shop.API -> PaymentGateway "Charge payment"
  Shop.API -> InventoryService "Reserve items"
  Shop.API -> EmailService "Send confirmation"
}

3. Bottleneck Identification

Where can things slow down?

UploadFlow = scenario "File Upload" {
  User -> Frontend "Upload file"
  Frontend -> API "Send file data"
  API -> Storage "Store file"
  Storage -> Processing "Process file"  // Potential bottleneck
  Processing -> Notification "Notify user"
}

4. Error Paths

What happens when things fail?

OrderFlow = scenario "Order Processing" {
  Customer -> Shop.API "Submit order"
  Shop.API -> PaymentGateway "Charge payment"
  PaymentGateway -> Shop.API "Payment result"

  // Success path
  Shop.API -> Shop.Database "Save order"
  Shop.API -> EmailService "Send confirmation"

  // Failure path
  Shop.API -> Shop.WebApp "Return error"
  Shop.WebApp -> Customer "Display error"
}

Types of Flows

1. Data Flow (DFD Style)

Shows how data moves through the system:

OrderFlow = flow "Order Data Flow" {
  Customer -> Shop.WebApp "Order details"
  Shop.WebApp -> Shop.API "Order JSON"
  Shop.API -> Shop.Database "Order record"
  Shop.Database -> Analytics "Order event"
}

Use when: Modeling data lineage, ETL processes, analytics pipelines

2. User Journey / Scenario (BDD Style)

Shows user interactions and system responses:

Checkout = story "User Checkout Flow" {
  Customer -> Shop.WebApp "Clicks checkout"
  Shop.WebApp -> Shop.API "Validate cart"
  Shop.API -> PaymentGateway "Process payment"
  PaymentGateway -> Shop.API "Payment result"
  Shop.API -> Shop.WebApp "Order confirmation"
  Shop.WebApp -> Customer "Show success message"
}

Use when: Modeling user stories, test scenarios, requirements

3. Control Flow

Shows decision points and branches:

ApprovalFlow = scenario "Order Approval" {
  Order -> ApprovalService "Submit for approval"

  // Branch 1: Auto-approved
  ApprovalService -> Database "Save order"

  // Branch 2: Manual review
  ApprovalService -> Manager "Request approval"
  Manager -> ApprovalService "Approve/Reject"
  ApprovalService -> Database "Save order"
}

Use when: Modeling business logic, workflows

4. Event Flow

Shows events and their propagation:

EventFlow = flow "Order Event Flow" {
  API -> EventBus "Publish order created"
  EventBus -> NotificationService "Consume event"
  EventBus -> AnalyticsService "Consume event"
  EventBus -> EmailService "Consume event"
}

Use when: Event-driven architectures, pub/sub patterns

Flow Characteristics

Linear Flow

LinearFlow = scenario "Simple Process" {
  Step1 -> Step2
  Step2 -> Step3
  Step3 -> Step4
}

Branching Flow

BranchingFlow = scenario "With Conditions" {
  Step1 -> Step2

  // Branch A
  Step2 -> Step3A
  Step3A -> Step4

  // Branch B
  Step2 -> Step3B
  Step3B -> Step4
}

Converging Flow

ConvergingFlow = scenario "Parallel then Merge" {
  Step1 -> Step2A
  Step1 -> Step2B
  Step2A -> Step3
  Step2B -> Step3
  Step3 -> Step4
}

Flows in Sruja

Using scenario

MyScenario = scenario "Scenario Title" {
  Step1 -> Step2 "Action"
  Step2 -> Step3 "Action"
}

Using story (alias)

MyStory = story "Story Title" {
  Step1 -> Step2 "Action"
  Step2 -> Step3 "Action"
}

Using flow

MyFlow = flow "Flow Title" {
  Step1 -> Step2 "Data"
  Step2 -> Step3 "Data"
}

When to Use Flows

Use Flows When

  • You need to show sequence (order matters)
  • Modeling data lineage or transformation
  • Documenting user journeys
  • Understanding process steps
  • Identifying bottlenecks

Don't Use Flows When

  • Showing general connections (use static relationships)
  • High-level overview (too much detail)
  • Simple system (relationships suffice)

Flow Anti-Patterns

Anti-Pattern 1: Too Detailed

// Bad: Too much detail
Flow = scenario "Login" {
  User -> UI "Click login button"
  UI -> API "HTTP POST /login"
  API -> Database "SELECT * FROM users"
  Database -> API "User record"
  API -> Auth "Verify password"
  Auth -> API "Result"
  API -> UI "Response"
  UI -> User "Show dashboard"
}

Solution: Group related steps.

Anti-Pattern 2: Too Abstract

// Bad: Not useful
Flow = scenario "Process" {
  Start -> End "Process completes"
}

Solution: Add meaningful intermediate steps.

Anti-Pattern 3: Mixing Flows with Static Relationships

// Bad: Confusing to read
Customer -> Shop.WebApp "Uses"
OrderFlow = scenario "Checkout" {
  Customer -> Shop.WebApp "Submits order"
}

Solution: Keep flows separate from static architecture.

Exercise

Identify the type of flow for each scenario:

  1. "User clicks 'Buy Now', sees payment form, enters card details, sees success page"
  2. "Order data is sent to API, saved to database, extracted to analytics warehouse"
  3. "Order is created, event is published, multiple services process the event"
  4. "Order is submitted, if >$100 requires approval, otherwise auto-approved"

Key Takeaways

  1. Flows show sequences: How things happen, not just connections
  2. Multiple flow types: Data flows, user journeys, control flows, event flows
  3. Right level of detail: Not too detailed, not too abstract
  4. Use appropriately: When sequence matters
  5. Separate from static: Flows complement, don't replace, relationships

Next Lesson

In Lesson 2, you'll learn how to create data flow diagrams.

Lesson 2: Data Flow Diagrams

Learning Goals

  • Create DFD-style data flows in Sruja
  • Model data lineage and transformations
  • Document ETL and analytics pipelines

What Are Data Flow Diagrams?

Data Flow Diagrams (DFDs) show how data moves through a system, including:

  • Where data originates
  • Where it's stored
  • How it's transformed
  • Where it ultimately goes

DFD Elements in Sruja

flow for Data-Oriented Flows

OrderDataFlow = flow "Order Data Processing" {
  Customer -> Shop.WebApp "Order form data"
  Shop.WebApp -> Shop.API "Order JSON"
  Shop.API -> Shop.Database "Order record"
  Shop.Database -> Analytics "Order event"
  Analytics -> Dashboard "Aggregated metrics"
}

DFD Patterns

Pattern 1: ETL Pipeline

ETLPipeline = flow "Data ETL Pipeline" {
  SourceSystem -> DataCollector "Raw data"
  DataCollector -> MessageQueue "Data events"
  MessageQueue -> DataProcessor "Consumes data"
  DataProcessor -> DataWarehouse "Transformed data"
  DataWarehouse -> ReportingEngine "Query results"
  ReportingEngine -> Dashboard "Visualizations"
}

Pattern 2: Event Sourcing

EventFlow = flow "Event Sourcing Pipeline" {
  API -> EventStore "Persist events"
  EventStore -> ProjectorA "Project to read model A"
  EventStore -> ProjectorB "Project to read model B"
  ProjectorA -> ReadDatabaseA "Read model A"
  ProjectorB -> ReadDatabaseB "Read model B"
}

Pattern 3: Analytics Pipeline

AnalyticsFlow = flow "User Analytics Pipeline" {
  UserApp -> TrackingService "User actions"
  TrackingService -> EventStream "Raw events"
  EventStream -> BatchProcessor "Daily batch"
  BatchProcessor -> DataWarehouse "Aggregated data"
  DataWarehouse -> ReportingTool "Analytics queries"
  ReportingTool -> BusinessTeam "Insights"
}

Pattern 4: Real-Time Processing

RealTimeFlow = flow "Real-Time Fraud Detection" {
  TransactionAPI -> IngestService "Transaction data"
  IngestService -> KafkaStream "Event stream"
  KafkaStream -> FraudDetectionService "Consume events"
  FraudDetectionService -> AlertService "Fraud alerts"
  AlertService -> SecurityTeam "Notifications"
}

Documenting Data Transformations

Use Relationship Labels

DataFlow = flow "Data Transformation" {
  RawSource -> ETLService "Raw CSV data"
  ETLService -> CleanedData "Validated, normalized data"
  CleanedData -> Aggregator "Aggregated metrics"
  Aggregator -> DataWarehouse "Hourly aggregations"
}

Add Metadata for Details

ETLService = container "ETL Service" {
  metadata {
    transformations [
      "Remove invalid records",
      "Normalize phone numbers",
      "Standardize dates"
    ]
    output_format "JSON"
    output_schema "v2"
  }
}

Complete DFD Example

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "PostgreSQL"
}

Analytics = system "Analytics Platform" {
  Ingestion = container "Data Ingestion"
  Processing = container "Data Processing"
  Warehouse = database "Data Warehouse"
  Reporting = container "Reporting Engine"
}

Dashboard = system "Analytics Dashboard" {
  UI = container "Dashboard UI"
}

// Data flow: Order processing and analytics
OrderAnalyticsFlow = flow "Order Analytics Pipeline" {
  Customer -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Order data"
  Shop.API -> Shop.Database "Persist order"

  // Real-time data capture
  Shop.API -> Analytics.Ingestion "Order event"
  Analytics.Ingestion -> Analytics.Processing "Validates and enriches"
  Analytics.Processing -> Analytics.Warehouse "Stores aggregated data"

  // Query and visualization
  Dashboard.UI -> Analytics.Reporting "Query metrics"
  Analytics.Reporting -> Analytics.Warehouse "Fetch data"
  Analytics.Reporting -> Dashboard.UI "Return results"
}

view index {
  include *
}

Data Lineage Tracing

Forward Tracing

// Where does this data go?
OrderFlow = flow "Order Data Lineage" {
  OrderAPI -> Database "Save order"
  Database -> ReplicationService "Replicate to secondary"
  ReplicationService -> AnalyticsDB "Stream to analytics"
  AnalyticsDB -> ReportGenerator "Generate reports"
}

Backward Tracing

// Where does this data come from?
ReportFlow = flow "Report Data Source" {
  UserActivityReport <- AnalyticsDB "Aggregated data"
  AnalyticsDB <- EventStream "Raw events"
  EventStream <- UserApp "User actions"
}

Error Handling in Flows

Document Error Paths

OrderFlow = scenario "Order Processing with Errors" {
  Customer -> Shop.API "Submit order"
  Shop.API -> PaymentGateway "Process payment"

  // Success path
  PaymentGateway -> Shop.API "Payment success"
  Shop.API -> Shop.Database "Save order"

  // Error path
  PaymentGateway -> Shop.API "Payment failed"
  Shop.API -> Shop.WebApp "Return error"
  Shop.WebApp -> Customer "Show error message"
}

Retry Logic

Shop.API = container "API Service" {
  metadata {
    retry_policy {
      max_attempts 3
      backoff "exponential"
      initial_delay "1s"
    }
  }
}

Performance Considerations

Document Latency Expectations

PaymentGateway = system "Payment Gateway" {
  metadata {
    expected_latency "500ms"
    timeout "5s"
  }
}

OrderFlow = scenario "Order Processing" {
  Customer -> Shop.WebApp "Submit order" [user_interaction]
  Shop.WebApp -> Shop.API "Send order" [internal_fast]
  Shop.API -> PaymentGateway "Process payment" [external_slower]
}

Identify Bottlenecks

ProcessingFlow = flow "File Processing Pipeline" {
  Upload -> Storage "Store file" [fast]
  Storage -> Processor "Process file" [bottleneck]
  Processor -> Notification "Notify user" [fast]
}

Exercise

Create a DFD for:

"A fitness tracking app where users log workouts. Workout data is sent to an API, stored in a database, and also sent to a real-time analytics service. The analytics service processes events and updates user dashboards. Daily, a batch job aggregates data and generates reports stored in a data warehouse for business analysis."

Create:

  • Main data flow
  • At least one data transformation
  • Real-time and batch processing paths

Key Takeaways

  1. Use flow for DFDs: Data-oriented flows
  2. Show transformations: How data changes
  3. Document lineage: Where data comes from and goes
  4. Handle errors: Show success and failure paths
  5. Identify bottlenecks: Where processing slows down

Next Lesson

In Lesson 3, you'll learn how to model user journeys and behavioral scenarios.

Lesson 3: User Journeys

Learning Goals

  • Model user journeys and behavioral scenarios
  • Use BDD-style scenarios for requirements
  • Document test cases with flows

What Are User Journeys?

User journeys (or scenarios) show how users interact with a system to achieve a goal. They're behavioral flows that capture:

  • User actions
  • System responses
  • Success and failure paths
  • Decision points

User Journey Elements in Sruja

scenario for Behavioral Flows

CheckoutScenario = scenario "User Checkout" {
  Customer -> Shop.WebApp "Clicks checkout"
  Shop.WebApp -> Shop.API "Validate cart"
  Shop.API -> PaymentGateway "Process payment"
  Shop.API -> EmailService "Send confirmation"
  Shop.WebApp -> Customer "Show success page"
}

story (alias)

CheckoutStory = story "As a customer, I want to checkout" {
  Customer -> Shop.WebApp "Clicks checkout"
  Shop.WebApp -> Shop.API "Validate cart"
  Shop.API -> PaymentGateway "Process payment"
  Shop.WebApp -> Customer "Show success page"
}

BDD (Behavior-Driven Development) Style

Given-When-Then Pattern

// GIVEN: Customer has items in cart
// WHEN: Customer clicks checkout
// THEN: Payment is processed and confirmation shown

CheckoutScenario = scenario "Customer Checkout" {
  Customer -> Shop.WebApp "Clicks checkout"
  Shop.WebApp -> Shop.API "Validate cart"
  Shop.API -> PaymentGateway "Process payment"
  Shop.API -> Shop.WebApp "Order confirmation"
  Shop.WebApp -> Customer "Show success page"
}

User Journey Patterns

Pattern 1: Happy Path

HappyPath = scenario "Successful User Registration" {
  User -> WebApp "Opens registration form"
  WebApp -> API "Submit registration"
  API -> Database "Create user"
  API -> EmailService "Send welcome email"
  API -> WebApp "Registration success"
  WebApp -> User "Show welcome page"
}

Pattern 2: Error Path

ErrorPath = scenario "Registration with Duplicate Email" {
  User -> WebApp "Submit registration"
  WebApp -> API "Submit registration"
  API -> Database "Check email exists"
  Database -> API "Email already exists"
  API -> WebApp "Return error"
  WebApp -> User "Show error: Email already registered"
}

Pattern 3: Branching Path

BranchingPath = scenario "Order Approval" {
  Manager -> WebApp "Submit for approval"

  // Branch 1: Auto-approved for < $100
  if value < 100 {
    WebApp -> API "Auto-approve"
    API -> Database "Save approved order"
  }

  // Branch 2: Manual review for > $100
  if value > 100 {
    WebApp -> API "Request manual approval"
    API -> Approver "Send approval request"
    Approver -> API "Approve order"
    API -> Database "Save approved order"
  }

  API -> WebApp "Approval result"
  WebApp -> Manager "Show confirmation"
}

Pattern 4: Retry Path

RetryPath = scenario "Payment with Retry" {
  Customer -> WebApp "Submit order"
  WebApp -> API "Process order"

  // First attempt fails
  API -> PaymentGateway "Process payment"
  PaymentGateway -> API "Payment failed: timeout"

  // Retry
  API -> PaymentGateway "Retry payment"
  PaymentGateway -> API "Payment success"

  API -> Database "Save order"
  API -> WebApp "Order confirmed"
  WebApp -> Customer "Show confirmation"
}

User Journey Examples

Example 1: E-Commerce

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "Database"
}

PaymentGateway = system "Payment Gateway"
EmailService = system "Email Service"

// Complete user journey
CheckoutJourney = scenario "Complete Checkout" {
  Customer -> Shop.WebApp "Add item to cart"
  Customer -> Shop.WebApp "Click checkout"
  Shop.WebApp -> Shop.API "Validate cart"
  Shop.API -> Shop.Database "Check inventory"
  Shop.Database -> Shop.API "Inventory available"
  Shop.API -> PaymentGateway "Process payment"
  PaymentGateway -> Shop.API "Payment success"
  Shop.API -> Shop.Database "Save order"
  Shop.API -> EmailService "Send confirmation"
  Shop.WebApp -> Customer "Show order confirmation"
}

view index {
  include *
}

Example 2: User Authentication

AuthJourney = scenario "User Login" {
  User -> WebApp "Enter credentials"
  WebApp -> API "Submit login"
  API -> Database "Verify credentials"
  Database -> API "Valid credentials"
  API -> AuthService "Generate JWT token"
  AuthService -> API "Token"
  API -> WebApp "Return token and user data"
  WebApp -> User "Show dashboard"
}

Example 3: File Upload

UploadJourney = scenario "File Upload" {
  User -> WebApp "Select file and click upload"
  WebApp -> API "Upload file data"
  API -> Storage "Store file"
  Storage -> API "File URL"
  API -> ProcessingService "Process file"
  ProcessingService -> API "Processing complete"
  API -> WebApp "Upload success"
  WebApp -> User "Show file preview"
}

Complex User Journey

ComplexJourney = scenario "Complete Order Workflow" {
  Customer -> Shop.WebApp "Browse products"
  Shop.WebApp -> Shop.API "Get products"
  Shop.API -> Shop.Database "Query products"
  Shop.Database -> Shop.API "Product data"
  Shop.API -> Shop.WebApp "Product list"
  Shop.WebApp -> Customer "Display products"

  Customer -> Shop.WebApp "Add to cart"
  Shop.WebApp -> Shop.API "Add to cart"
  Shop.API -> Shop.Database "Save cart item"

  Customer -> Shop.WebApp "Checkout"
  Shop.WebApp -> Shop.API "Create order"
  Shop.API -> Shop.Database "Save order"
  Shop.API -> PaymentGateway "Process payment"
  PaymentGateway -> Shop.API "Payment result"

  // Success path
  if payment_success {
    Shop.API -> EmailService "Send confirmation"
    Shop.API -> InventoryService "Reserve items"
    Shop.API -> NotificationService "Notify warehouse"
    Shop.WebApp -> Customer "Order confirmation"
  }

  // Failure path
  if payment_failed {
    Shop.API -> Shop.WebApp "Payment error"
    Shop.WebApp -> Customer "Show error message"
  }
}

Testing with Scenarios

Acceptance Criteria

// As a customer, I want to checkout so that I can purchase products

AcceptanceScenario = scenario "Checkout Acceptance Criteria" {
  // AC1: Customer can checkout with valid payment
  Customer -> Shop.WebApp "Checkout with valid payment"
  Shop.WebApp -> Shop.API "Process order"
  Shop.API -> PaymentGateway "Charge card"
  PaymentGateway -> Shop.API "Success"
  Shop.API -> Shop.WebApp "Order confirmed"
  Shop.WebApp -> Customer "Show confirmation page"

  // AC2: Customer sees error with invalid payment
  Customer -> Shop.WebApp "Checkout with invalid card"
  Shop.WebApp -> Shop.API "Process order"
  Shop.API -> PaymentGateway "Charge card"
  PaymentGateway -> Shop.API "Declined"
  Shop.API -> Shop.WebApp "Payment failed"
  Shop.WebApp -> Customer "Show error message"

  // AC3: Customer receives email confirmation
  Shop.API -> EmailService "Send confirmation"
  Customer -> EmailService "Receive confirmation email"
}

Documenting Edge Cases

Edge Case Scenarios

EdgeCase1 = scenario "Checkout with Expired Card" {
  Customer -> Shop.WebApp "Checkout"
  Shop.WebApp -> Shop.API "Process payment"
  Shop.API -> PaymentGateway "Charge card"
  PaymentGateway -> Shop.API "Card expired"
  Shop.API -> Shop.WebApp "Error: Card expired"
  Shop.WebApp -> Customer "Show error and prompt new card"
}

EdgeCase2 = scenario "Checkout with Insufficient Inventory" {
  Customer -> Shop.WebApp "Checkout"
  Shop.WebApp -> Shop.API "Process order"
  Shop.API -> Shop.Database "Check inventory"
  Shop.Database -> Shop.API "Insufficient stock"
  Shop.API -> Shop.WebApp "Error: Not enough stock"
  Shop.WebApp -> Customer "Show error and suggest alternatives"
}

Exercise

Create user journeys for:

  1. Happy Path: User registers successfully
  2. Error Path: User tries to register with existing email
  3. Branching: User submits order that requires approval if > $1000
  4. Retry: Payment fails twice, succeeds on third attempt

Key Takeaways

  1. Use scenario/story: For behavioral flows
  2. Model happy and error paths: Document all outcomes
  3. Include user actions: From user perspective
  4. Document edge cases: Unexpected scenarios
  5. Use for testing: Scenarios make good test cases

Module 4 Complete

You've completed Flows! You now understand:

  • What flows are and when to use them
  • Data Flow Diagrams (DFDs)
  • User journeys and behavioral scenarios

Next: Learn about Module 5: Feedback Loops.

Module 5: Feedback Loops

Overview

In this module, you'll learn to model feedback loops - how actions create reactions that affect future actions. Feedback loops are natural patterns in systems and not errors.

Learning Objectives

By the end of this module, you'll be able to:

  • Understand different types of feedback loops
  • Model positive and negative feedback
  • Recognize when cycles are valid patterns
  • Design self-regulating and adaptive systems

Lessons

Prerequisites

Time Investment

Approximately 1-1.5 hours to complete all lessons and exercises.

What's Next

After completing this module, you'll learn about Module 6: Context.

Lesson 1: Understanding Feedback Loops

Learning Goals

  • Understand what feedback loops are
  • Recognize feedback loops in everyday systems
  • Learn why cycles are not errors

What Are Feedback Loops?

A feedback loop occurs when an action creates a reaction that affects future actions. It's a cycle where the output becomes input for the next iteration.

Action → Response → Adjustment → Action (repeated)

Everyday Examples

Example 1: Thermostat

Temperature drops
    ↓
Thermostat detects low temp
    ↓
Turns on heater
    ↓
Temperature rises
    ↓
Thermostat turns off heater
    ↓
Temperature drops
    ↓
[Loop repeats]

Example 2: Feedback at Work

Submit code for review
    ↓
Manager provides feedback
    ↓
Developer fixes issues
    ↓
Submit revised code
    ↓
[Loop repeats until approved]

Example 3: Social Media Algorithm

User watches video
    ↓
Algorithm recommends similar videos
    ↓
User watches more
    ↓
Algorithm learns preferences
    ↓
[Loop reinforces the behavior]

Feedback Loops in Software Architecture

Example 1: Auto-Scaling

// Positive feedback loop: Scale up when load increases
MonitoringSystem -> App.API "Detects high load"
App.API -> AutoScaler "Request scale up"
AutoScaler -> App.API "Adds more instances"
App.API -> MonitoringSystem "Reports lower load"

Example 2: User Experience

// User feedback loop
User -> App.WebApp "Submits form"
App.WebApp -> App.API "Validates"
App.API -> App.WebApp "Returns errors"
App.WebApp -> User "Shows errors"
// User corrects and resubmits (loop)

Example 3: Inventory Management

// Inventory feedback loop
Shop.API -> Inventory "Updates stock"
Inventory -> Shop.API "Notifies low stock"
Shop.API -> Admin "Sends alert"
Admin -> Shop.API "Restocks inventory"
Shop.API -> Inventory "Updates stock"
// Loop continues as inventory changes

Why Feedback Loops Matter

1. Self-Regulation

Systems can adjust automatically:

AutoScaling = scenario "Auto-Scaling Feedback" {
  Monitor -> App "Detects high CPU"
  App -> ScalingService "Request scale up"
  ScalingService -> App "Add instances"
  App -> Monitor "CPU decreases"
  // If CPU still high, loop repeats
}

2. Learning and Adaptation

Systems improve over time:

MLFeedback = scenario "Machine Learning Feedback" {
  User -> App "Rates recommendation"
  App -> MLModel "Update preferences"
  MLModel -> App "Improved recommendations"
  App -> User "Better suggestions"
  // User rates again, model improves
}

3. Error Recovery

Systems recover from failures:

RetryLoop = scenario "Retry with Backoff" {
  App -> ExternalAPI "Make request"
  ExternalAPI -> App "Request failed (timeout)"

  App -> ExternalAPI "Retry in 1s"
  ExternalAPI -> App "Request failed"

  App -> ExternalAPI "Retry in 2s"
  ExternalAPI -> App "Success"
}

4. Resource Management

Optimize resource usage:

ResourceFeedback = scenario "Cache Feedback" {
  App -> Cache "Check cache"
  Cache -> App "Cache miss"
  App -> Database "Query data"
  Database -> App "Return data"
  App -> Cache "Store in cache"

  // Next time: Cache hit
  App -> Cache "Check cache"
  Cache -> App "Cache hit (faster)"
}

Feedback Loop Types

1. Positive Feedback (Reinforcing)

Increases the effect, leading to exponential growth or collapse:

// Example: Viral sharing
UserA -> App "Shares content"
App -> UserB "Shows content"
UserB -> App "Shares content"
App -> UserC "Shows content"
UserC -> App "Shares content"
// More users see and share

Use when: viral growth, network effects, learning systems

2. Negative Feedback (Balancing)

Reduces the effect, maintaining stability:

// Example: Temperature control
Thermostat -> Heater "Turn on if too cold"
Heater -> Room "Heats room"
Room -> Thermostat "Temperature reading"
Thermostat -> Heater "Turn off if too warm"
// Maintains stable temperature

Use when: Stability, control, regulation

3. Delayed Feedback

Feedback occurs after a delay:

// Example: Performance monitoring
App -> Database "Slow query"
Database -> App "Returns data"
App -> Analytics "Logs slow query"
Analytics -> App "Sends alert (after threshold)"

// Alert arrives later, system adjusts

Are Cycles Bad?

In traditional software architecture, circular dependencies are considered bad. But in systems thinking, cycles are natural and valid.

Circular Dependency (Bad)

// Bad: Module A depends on B, B depends on A
ModuleA -> ModuleB "Calls"
ModuleB -> ModuleA "Calls"

// Problem: Impossible to initialize, tight coupling

Feedback Loop (Good)

// Good: System learns from output
User -> System "Submits data"
System -> User "Shows result"
User -> System "Adjusts based on feedback"

// Problem: Natural, enables adaptation

Key Differences

Circular DependencyFeedback Loop
Static (compile-time)Dynamic (runtime)
Tight couplingLoose coupling with clear purpose
AvoidEmbrace
Makes system brittleMakes system adaptive
No clear purposeServes a specific function

Exercise

Identify feedback loops in these scenarios:

  1. Chat application: User sends message, app shows typing indicator, receiver sees it, receiver starts typing, sender sees typing indicator...

  2. Recommendation system: User watches video, algorithm updates preferences, recommends more videos, user watches more, algorithm learns more...

  3. CI/CD pipeline: Developer commits code, tests run, if fails developer fixes, commits again, tests run...

Key Takeaways

  1. Feedback loops are cycles where output affects future input
  2. They're everywhere: In nature, software, everyday life
  3. Not errors: Unlike circular dependencies, feedback loops are valid patterns
  4. Enable adaptation: Systems can self-regulate and improve
  5. Two main types: Positive (reinforcing) and negative (balancing)

Next Lesson

In Lesson 2, you'll learn about different types of feedback loops and when to use each.

Lesson 2: Types of Feedback Loops

Learning Goals

  • Understand positive, negative, and balancing feedback loops
  • Learn when to use each type
  • Recognize feedback loop behavior in systems

Feedback Loop Classification

Feedback Loops
├── Positive (Reinforcing)
│   ├── Virtuous cycle (good)
│   └── Vicious cycle (bad)
└── Negative (Balancing)
    ├── Self-regulating
    ├── Error-correcting
    └── Stabilizing

Positive Feedback Loops (Reinforcing)

Amplify change, leading to exponential growth or collapse.

Pattern 1: Virtuous Cycle (Growth)

VirtuousCycle = scenario "Network Effects" {
  UserA -> App "Joins platform"
  App -> UserB "Invites friend"
  UserB -> App "Joins platform"
  App -> UserC "Invites friend"

  // More users → More value → More users
  App -> Users "Platform value increases"
  Users -> App "More users join"
}

Use when: Viral growth, network effects, learning systems

Examples:

  • Social networks (more users = more connections)
  • Marketplaces (more sellers = more buyers)
  • Machine learning (more data = better predictions)

Pattern 2: Vicious Cycle (Collapse)

ViciousCycle = scenario "Performance Degradation" {
  User -> App "Makes request"
  App -> Database "Query"

  // Slow query → Queue builds → Slower queries
  Database -> App "Slow response"
  App -> Queue "Requests back up"
  Queue -> Database "More load"
  Database -> App "Even slower response"

  // If not interrupted, system collapses
}

Use when: Identifying failure modes, designing circuit breakers

Examples:

  • System overload
  • Cache stampede
  • Database deadlock cascade

Pattern 3: Learning Loop (Improvement)

LearningLoop = scenario "ML Model Improvement" {
  User -> Model "Makes prediction"
  Model -> User "Shows result"
  User -> Model "Rates accuracy"

  // Better data → Better model → Better predictions
  Model -> Training "Updates with rating"
  Training -> Model "Improved model"
  Model -> User "Better predictions"
}

Use when: Machine learning, recommendation systems, A/B testing

Examples:

  • Search engine relevance
  • Ad targeting
  • Personalized recommendations

Negative Feedback Loops (Balancing)

Counteract change, maintaining stability and equilibrium.

Pattern 1: Self-Regulating (Homeostasis)

Homeostasis = scenario "Auto-Scaling" {
  Monitoring -> App "Detects high load"
  App -> ScalingService "Request scale up"
  ScalingService -> App "Adds instances"
  App -> Monitoring "Reports lower load"

  // System self-regulates to target load
  Monitoring -> App "Detects low load"
  App -> ScalingService "Request scale down"
  ScalingService -> App "Removes instances"
}

Use when: Auto-scaling, resource management, capacity planning

Examples:

  • Server auto-scaling
  • Database connection pooling
  • Rate limiting

Pattern 2: Error-Correcting (Resilience)

ErrorCorrection = scenario "Retry with Backoff" {
  App -> Service "Make request"
  Service -> App "Failure"

  // Exponential backoff
  App -> Service "Retry after 1s"
  Service -> App "Failure"
  App -> Service "Retry after 2s"
  Service -> App "Failure"
  App -> Service "Retry after 4s"
  Service -> App "Success"

  // System recovers from transient errors
}

Use when: Error handling, resilience, fault tolerance

Examples:

  • Retry logic
  • Circuit breakers
  • Fallback mechanisms

Pattern 3: Stabilizing (Control)

Stabilizing = scenario "Rate Limiting" {
  User -> API "Send request"
  API -> RateLimiter "Check rate"

  // If under limit, allow
  RateLimiter -> API "Under limit"
  API -> User "Process request"

  // If over limit, throttle
  API -> RateLimiter "Check rate"
  RateLimiter -> API "Over limit"
  API -> User "Rate limit exceeded"

  // System stabilizes request rate
}

Use when: Traffic control, resource allocation, load balancing

Examples:

  • API rate limiting
  • Load balancers
  • Queue management

Comparing Feedback Loops

TypeEffectStabilityGrowthExample
Positive (Virtuous)AmplifiesDecreasesIncreasesViral growth
Positive (Vicious)AmplifiesDecreasesDecreasesSystem collapse
Negative (Balancing)CounteractsIncreasesMaintainsAuto-scaling

Delayed Feedback

Feedback occurs after a delay, can cause oscillation.

DelayedFeedback = scenario "Delayed Monitoring" {
  App -> System "Request processed"

  // Delay before feedback
  System -> Analytics "Log event"
  Analytics -> Monitoring "Aggregate metrics"
  Monitoring -> App "Send alert (5 min delay)"

  // By the time alert arrives, system may have changed
}

Mitigation:

  • Use real-time monitoring
  • Reduce delays
  • Add hysteresis (threshold buffer)

Feedback Loop Behaviors

Convergent

System reaches a stable state:

Convergent = scenario "Converging to Equilibrium" {
  // System oscillates but stabilizes
  App -> Cache "Check cache"
  Cache -> App "Miss"
  App -> Database "Query"
  Database -> App "Data"
  App -> Cache "Store"

  // Next time: Cache hit (faster, stabilizes)
  App -> Cache "Check cache"
  Cache -> App "Hit (stable)"
}

Divergent

System grows without bound or collapses:

Divergent = scenario "Exponential Growth" {
  User -> App "Invite friend"
  App -> Friend "Invitation"
  Friend -> App "Accept and invite"
  App -> Friend2 "Invitation"
  Friend2 -> App "Accept and invite"

  // Without limits, grows exponentially
}

Oscillating

System cycles between states:

Oscillating = scenario "Hysteresis Loop" {
  Monitor -> App "Load high"
  App -> Scaling "Scale up"
  App -> Monitor "Load normal"

  Monitor -> App "Load low"
  App -> Scaling "Scale down"
  App -> Monitor "Load high"

  // Oscillates between scale up/down
}

Designing Feedback Loops

Step 1: Identify the Feedback

What creates the feedback?

// What output becomes input?
User -> App "Action"
App -> User "Response"
// User's response becomes next action's input

Step 2: Determine the Type

Is it reinforcing or balancing?

// Reinforcing: Increases effect
App -> ML "More data"
ML -> App "Better model"

// Balancing: Maintains equilibrium
App -> Monitor "Check load"
Monitor -> App "Adjust capacity"

Step 3: Add Controls

Prevent runaway behavior:

AutoScaling = container "Auto-Scaling Service" {
  scale {
    min 2
    max 10
    metric "cpu > 80%"
  }
}

Step 4: Monitor

Observe behavior:

Monitor = system "Monitoring System" {
  metrics {
    cpu_usage
    request_rate
    error_rate
    feedback_loop_iterations
  }
}

Exercise

Classify these feedback loops:

  1. "More users join platform → More content → More users join..."
  2. "System detects high load → Scales up → Load decreases → Scales down..."
  3. "User likes video → Algorithm shows similar videos → User likes more → Algorithm learns..."
  4. "Request fails → Retry with backoff → Eventually succeeds..."

Key Takeaways

  1. Positive loops: Reinforce change (growth or collapse)
  2. Negative loops: Maintain stability (balance and regulate)
  3. Delayed feedback: Can cause oscillation, needs monitoring
  4. Design intentionally: Add controls to prevent runaway behavior
  5. Monitor behavior: Ensure loops behave as expected

Next Lesson

In Lesson 3, you'll learn how to model feedback loops as valid cycles in Sruja.

Lesson 3: Modeling Cycles

Learning Goals

  • Learn how to create valid cycles in Sruja
  • Model feedback loops explicitly
  • Differentiate cycles from circular dependencies

Cycles in Sruja

Unlike circular dependencies (bad), feedback loops (good) are valid cycles in Sruja.

Basic Cycle Syntax

// Valid feedback loop
User -> App.WebApp "Submits form"
App.WebApp -> App.API "Validates"
App.API -> App.WebApp "Returns result"
App.WebApp -> User "Shows feedback"

// User resubmits (cycle completes)

Modeling Feedback Loops

Example 1: User Feedback Loop

import { * } from 'sruja.ai/stdlib'

User = person "User"

App = system "Application" {
  WebApp = container "Web Application"
  API = container "API Service"
}

// User feedback cycle
UserFeedback = scenario "User Form Feedback" {
  User -> App.WebApp "Submit form"
  App.WebApp -> App.API "Validate input"
  App.API -> App.WebApp "Return validation result"

  // Error path (loop)
  if has_errors {
    App.WebApp -> User "Show errors"
    User -> App.WebApp "Correct and resubmit"
  }

  // Success path (end cycle)
  if no_errors {
    App.WebApp -> Database "Save data"
    App.WebApp -> User "Show success"
  }
}

Example 2: System Self-Regulation

AutoScaling = system "Auto-Scaling System" {
  Monitor = container "Monitoring Service"
  Scaling = container "Scaling Service"
}

App = system "Application" {
  API = container "API Service"
}

// Auto-scaling feedback loop
ScalingLoop = scenario "Auto-Scaling Feedback" {
  App.API -> AutoScaling.Monitor "Reports load"

  // Scale up if load is high
  if load_high {
    AutoScaling.Monitor -> AutoScaling.Scaling "Trigger scale up"
    AutoScaling.Scaling -> App.API "Add instances"
    App.API -> AutoScaling.Monitor "Reports new load"
    // Loop continues until load normalizes
  }

  // Scale down if load is low
  if load_low {
    AutoScaling.Monitor -> AutoScaling.Scaling "Trigger scale down"
    AutoScaling.Scaling -> App.API "Remove instances"
    App.API -> AutoScaling.Monitor "Reports new load"
    // Loop continues until load normalizes
  }
}

Example 3: Inventory Management

Admin = person "Administrator"

Shop = system "Shop" {
  API = container "API Service"
  Inventory = database "Inventory Database"
}

// Inventory feedback loop
InventoryLoop = scenario "Inventory Feedback" {
  Shop.API -> Shop.Inventory "Update stock"

  // Low stock alert
  if stock_low {
    Shop.Inventory -> Shop.API "Notify low stock"
    Shop.API -> Admin "Send restock alert"
    Admin -> Shop.API "Restock inventory"
    Shop.API -> Shop.Inventory "Update stock"
    // Inventory updates, loop may repeat
  }

  // Normal stock (no action needed)
  if stock_normal {
    Shop.Inventory -> Shop.API "Stock OK"
  }
}

Example 4: Learning System

User = person "User"

MLSystem = system "ML Recommendation System" {
  API = container "Recommendation API"
  Model = database "ML Model"
  Training = container "Training Pipeline"
}

// Learning feedback loop
LearningLoop = scenario "ML Learning Cycle" {
  User -> MLSystem.API "Request recommendations"
  MLSystem.API -> MLSystem.Model "Get predictions"
  MLSystem.Model -> MLSystem.API "Return recommendations"
  MLSystem.API -> User "Show recommendations"

  // User feedback
  User -> MLSystem.API "Rate recommendations"

  // Model update
  MLSystem.API -> MLSystem.Training "Add training data"
  MLSystem.Training -> MLSystem.Model "Update model"

  // Improved recommendations next time
  MLSystem.Model -> MLSystem.API "Better predictions"
  MLSystem.API -> User "Show improved recommendations"
}

Explicit vs Implicit Cycles

Explicit Cycle (Clear Feedback)

// Shows the complete feedback path
User -> App "Submit data"
App -> User "Show result"
User -> App "Adjust and resubmit"

// Clearly shows learning/adaptation

Implicit Cycle (Inferred)

// Relationships imply the cycle exists
User -> App "Uses"
App -> User "Responds"

// Cycle is there but not explicitly modeled

Recommendation: Use explicit cycles for important feedback mechanisms.

Valid Cycles vs Circular Dependencies

Circular Dependency (Bad)

// Static compile-time dependency
ModuleA -> ModuleB "Imports"
ModuleB -> ModuleA "Imports"

// Problems:
// - Impossible to initialize
// - Tight coupling
// - No clear purpose

Feedback Loop (Good)

// Dynamic runtime feedback
User -> App "Submits data"
App -> User "Shows result"

// Benefits:
// - Enables adaptation
// - Clear purpose
// - Loose coupling (eventual consistency)

Feedback Loop Patterns

Pattern 1: Immediate Feedback

InstantFeedback = scenario "Form Validation" {
  User -> WebApp "Type in field"
  WebApp -> API "Validate"
  API -> WebApp "Result (instant)"
  WebApp -> User "Show error/success"
}

Pattern 2: Delayed Feedback

DelayedFeedback = scenario "Performance Monitoring" {
  App -> Monitoring "Log metrics"

  // Time passes

  Monitoring -> App "Send alert (after threshold)"
  App -> Admin "Notify"
  Admin -> App "Adjust configuration"
  App -> Monitoring "Log new metrics"
}

Pattern 3: Aggregated Feedback

AggregatedFeedback = scenario "A/B Testing" {
  Users -> App "Use feature A"

  // Aggregate many interactions
  App -> Analytics "Log events"
  Analytics -> Dashboard "Show aggregated results"

  Team -> App "Make decision based on data"
  App -> Users "Roll out winner to all"
}

Feedback Loop Controls

Prevent Runaway Behavior

AutoScaling = container "Auto-Scaling Service" {
  scale {
    min 2
    max 10
    metric "cpu > 80%"
    cooldown "5 minutes"  // Prevent rapid scaling
  }
}

Circuit Breaker Pattern

CircuitBreaker = scenario "Circuit Breaker Feedback" {
  App -> Service "Make request"

  // If failures exceed threshold, open circuit
  if failures > threshold {
    App -> Fallback "Use fallback"
    Fallback -> App "Return cached data"

    // After cooldown, try again
    if cooldown_elapsed {
      App -> Service "Make request"
      Service -> App "Success (close circuit)"
    }
  }
}

Rate Limiting

RateLimited = scenario "Rate Limited Requests" {
  User -> API "Send request"

  // Check rate limit
  API -> RateLimiter "Check limit"

  if under_limit {
    RateLimiter -> API "Allow"
    API -> User "Process request"
  }

  if over_limit {
    RateLimiter -> API "Throttle"
    API -> User "Rate limit exceeded"

    // User waits and retries (feedback loop)
    User -> API "Retry after delay"
  }
}

Documenting Feedback Loops

Add Metadata

AutoScaling = system "Auto-Scaling" {
  metadata {
    feedback_loop {
      type "negative_balancing"
      purpose "Maintain target CPU usage"
      target "70% CPU"
      controls "min/max instance limits"
      monitoring "CPU, latency, error rate"
    }
  }
}

Complete Example: E-Commerce Feedback Loops

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"
Admin = person "Administrator"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "Database"
  Cache = database "Redis Cache"
}

// User feedback loop (interactive)
UserFeedback = scenario "Checkout Feedback" {
  Customer -> Shop.WebApp "Submit order"
  Shop.WebApp -> Shop.API "Process order"

  // Payment feedback
  Shop.API -> PaymentGateway "Process payment"
  PaymentGateway -> Shop.API "Payment result"

  if payment_failed {
    Shop.API -> Shop.WebApp "Return error"
    Shop.WebApp -> Customer "Show error, retry"
    Customer -> Shop.WebApp "Try again"  // Loop
  }

  if payment_success {
    Shop.API -> Shop.Database "Save order"
    Shop.API -> Shop.WebApp "Success"
    Shop.WebApp -> Customer "Show confirmation"
  }
}

// Inventory feedback loop (self-regulating)
InventoryFeedback = scenario "Inventory Feedback" {
  Shop.API -> Shop.Database "Update inventory"

  if stock_low {
    Shop.Database -> Shop.API "Notify low stock"
    Shop.API -> Admin "Send alert"
    Admin -> Shop.API "Restock"
    Shop.API -> Shop.Database "Update inventory"
  }
}

// Cache feedback loop (learning)
CacheFeedback = scenario "Cache Learning" {
  Shop.API -> Shop.Cache "Query cache"

  if cache_hit {
    Shop.Cache -> Shop.API "Return data (fast)"
  }

  if cache_miss {
    Shop.API -> Shop.Database "Query database"
    Shop.Database -> Shop.API "Return data"
    Shop.API -> Shop.Cache "Store in cache"
    // Next request will be a cache hit
  }
}

view index {
  include *
}

Exercise

Model feedback loops for:

  1. Chat application: User types, app shows "typing indicator", receiver sees it, receiver types, sender sees "typing indicator"...

  2. Review system: User rates product, system updates average rating, displays to next users...

  3. Load balancing: Server gets overloaded, balancer sends traffic to other servers, load redistributes...

Key Takeaways

  1. Cycles are valid in Sruja: Feedback loops are not errors
  2. Explicit cycles: Model important feedback mechanisms
  3. Differentiate: Circular dependencies (bad) vs feedback loops (good)
  4. Add controls: Prevent runaway behavior with limits
  5. Document clearly: Use metadata to explain feedback loops

Module 5 Complete

You've completed Feedback Loops! You now understand:

  • What feedback loops are and why they matter
  • Types of feedback loops (positive, negative, delayed)
  • How to model cycles in Sruja

Next: Learn about Module 6: Context.

Module 6: Context

Overview

In this module, you'll learn to capture the environment your system operates in - stakeholders, dependencies, constraints, and success criteria.

Learning Objectives

By the end of this module, you'll be able to:

  • Identify and document stakeholders
  • Model external dependencies and integrations
  • Define constraints and non-functional requirements
  • Capture success criteria and SLOs

Lessons

Prerequisites

Time Investment

Approximately 1-1.5 hours to complete all lessons and exercises.

Course Completion

After completing this module, you'll have finished Systems Thinking 101!

Next Steps

After completing this course:

Lesson 1: Understanding Context

Learning Goals

  • Understand what context is in systems thinking
  • Recognize the different layers of context
  • Learn why context matters for architecture

What Is Context?

Context is the environment your system operates in. It includes everything that affects or is affected by your system, even if it's not part of the system itself.

┌─────────────────────────────────────────────┐
│              ORGANIZATIONAL CONTEXT          │
│  Company culture, processes, constraints    │
│                                             │
│   ┌─────────────────────────────────────┐   │
│   │       TECHNICAL CONTEXT            │   │
│   │  Dependencies, infrastructure, APIs  │   │
│   │                                   │   │
│   │  ┌─────────────────────────────┐  │   │
│   │  │    STAKEHOLDER CONTEXT     │  │   │
│   │  │  Users, teams, customers   │  │   │
│   │  │                           │  │   │
│   │  │  ┌───────────────────────┐ │  │   │
│   │  │  │   YOUR SYSTEM         │ │  │   │
│   │  │  └───────────────────────┘ │  │   │
│   │  └─────────────────────────────┘  │   │
│   └─────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

Layers of Context

1. Stakeholder Context

Who cares about your system?

// People who use or influence the system
Customer = person "Customer"
Administrator = person "Administrator"
SupportTeam = person "Support Team"
BusinessOwner = person "Business Owner"

2. Technical Context

What technical dependencies exist?

// External systems and services
PaymentGateway = system "Payment Gateway"
EmailService = system "Email Service"
Analytics = system "Analytics Platform"
CDN = system "Content Delivery Network"

3. Organizational Context

What organizational factors affect the system?

// Documented in metadata
Shop = system "Shop" {
  metadata {
    team ["platform-team"]
    business_unit ["e-commerce"]
    compliance ["PCI-DSS"]
    cost_center ["engineering"]
  }
}

Why Context Matters

1. Stakeholder Alignment

Understand who your system serves:

// Different stakeholders have different needs
Customer -> Shop "Wants fast checkout"
Administrator -> Shop "Wants easy management"
BusinessOwner -> Shop "Wants high conversion"
ComplianceOfficer -> Shop "Wants data security"

2. Dependency Management

Know what you depend on:

Shop = system "Shop"

// Explicit dependencies
Shop -> PaymentGateway "Depends on for payments"
Shop -> EmailService "Depends on for notifications"
Shop -> Analytics "Depends on for tracking"

// If any dependency fails, what's the impact?

3. Constraint Awareness

Understand limitations:

Shop = system "Shop" {
  metadata {
    constraints {
      "PCI-DSS compliance required",
      "Maximum response time: 2s",
      "Budget: $500/month infrastructure",
      "Team size: 3 engineers"
    }
  }
}

4. Success Criteria

Define what success looks like:

Shop = system "Shop" {
  slo {
    availability {
      target "99.9%"
    }
    latency {
      p95 "200ms"
    }
  }

  metadata {
    success_criteria {
      "Support 10k concurrent users",
      "Less than 1% abandoned carts",
      "Checkout completion rate > 80%"
    }
  }
}

Context in Sruja

Using overview Block

overview {
  summary "E-commerce platform for online retail"
  audience "Customers, administrators, business owners"
  scope "Shopping cart, checkout, order management"
  goals [
    "Fast and reliable checkout",
    "Easy order management",
    "Real-time inventory tracking"
  ]
  non_goals [
    "Social features",
    "Mobile app (web-only)"
  ]
  risks [
    "Payment gateway downtime",
    "Database scaling limits"
  ]
}

Using person for Stakeholders

// Different types of stakeholders
Customer = person "Customer"
Administrator = person "Administrator"
SupportAgent = person "Support Agent"
ProductManager = person "Product Manager"
ComplianceOfficer = person "Compliance Officer"

Using system for External Dependencies

// Document external services
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "critical"]
    sla "99.9% uptime"
  }
}

EmailService = system "Email Service" {
  metadata {
    tags ["external"]
    priority "low"  // Email can be delayed
  }
}

Using requirements

// Capture constraints and requirements
R1 = requirement functional "Must support multiple payment methods"
R2 = requirement constraint "Must be PCI-DSS compliant"
R3 = requirement performance "Page load time < 2 seconds"
R4 = requirement security "All data encrypted at rest"

Context Examples

Example 1: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'

// Stakeholder context
Customer = person "Customer"
Administrator = person "Administrator"
BusinessOwner = person "Business Owner"
SupportTeam = person "Support Team"

// System
Shop = system "Shop"

// Technical context (dependencies)
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "critical", "pci-compliant"]
  }
}

EmailService = system "Email Service" {
  metadata {
    tags ["external", "low-priority"]
  }
}

AnalyticsService = system "Analytics Service" {
  metadata {
    tags ["external"]
  }
}

// Relationships show dependencies
Shop -> PaymentGateway "Depends on for payments"
Shop -> EmailService "Depends on for notifications"
Shop -> AnalyticsService "Depends on for tracking"

Customer -> Shop "Wants fast, reliable shopping"
Administrator -> Shop "Wants easy management"
BusinessOwner -> Shop "Wants high revenue"
SupportTeam -> Shop "Wants clear error messages"

view index {
  include *
}

Example 2: Internal Tool

// Stakeholder context (internal only)
Developer = person "Developer"
QAEngineer = person "QA Engineer"
DevOpsEngineer = person "DevOps Engineer"

// System
DeploymentTool = system "Deployment Tool"

// Technical context
GitHub = system "GitHub" {
  metadata {
    tags ["external"]
  }
}

AWS = system "Amazon Web Services" {
  metadata {
    tags ["external", "infrastructure"]
  }
}

Slack = system "Slack" {
  metadata {
    tags ["external", "communication"]
  }
}

// Dependencies
DeploymentTool -> GitHub "Fetches code"
DeploymentTool -> AWS "Deploys to AWS"
DeploymentTool -> Slack "Sends notifications"

// Context shows different stakeholders have different needs
Developer -> DeploymentTool "Wants simple interface"
QAEngineer -> DeploymentTool "Wants detailed logs"
DevOpsEngineer -> DeploymentTool "Wants monitoring & alerts"

Context Anti-Patterns

Anti-Pattern 1: No Context

// Bad: System in isolation
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
}

Solution: Add stakeholders and dependencies.

Anti-Pattern 2: Too Much Context

// Bad: Everything is context, system is lost
Customer = person "Customer"
Manager = person "Manager"
Support = person "Support"
Finance = person "Finance"
Legal = person "Legal"
HR = person "HR"
Marketing = person "Marketing"
// ... 20 more stakeholders

Solution: Focus on key stakeholders directly affected.

Anti-Pattern 3: Wrong Level of Context

// Bad: Too detailed (implementation, not context)
Shop = system "Shop"
PostgreSQL = system "PostgreSQL"
Redis = system "Redis"
Nginx = system "Nginx"

Solution: Group into higher-level services.

Exercise

Identify the context layers for a hotel booking system:

Requirements: "A hotel booking system allows guests to search rooms, make reservations, and manage bookings. Hotel staff can manage room inventory and view reports. The system integrates with a payment gateway for payments and sends SMS confirmations. Hotel management needs reporting on occupancy and revenue."

Identify:

  1. Stakeholders: _
  2. External dependencies: _
  3. Organizational constraints: _

Key Takeaways

  1. Context is the environment: Everything affecting or affected by your system
  2. Multiple layers: Stakeholder, technical, organizational
  3. Use Sruja features: overview, person, system, requirements
  4. Balance detail: Don't ignore context, don't overwhelm
  5. Context matters: Affects design, decisions, and success

Next Lesson

In Lesson 2, you'll learn how to model stakeholders and their relationships to your system.

Lesson 2: Stakeholders

Learning Goals

  • Identify key stakeholders for your system
  • Model different stakeholder types
  • Document stakeholder needs and concerns

Who Are Stakeholders?

Stakeholders are people or groups who are affected by or can affect your system. They include users, customers, team members, and anyone else with an interest in the system.

Stakeholder Categories

1. Primary Users

Direct users of the system:

Customer = person "Customer" {
  description "End users who purchase products"
  metadata {
    needs ["Fast checkout", "Easy product search", "Order tracking"]
    pain_points ["Complex checkout", "Slow search results"]
  }
}

Administrator = person "Administrator" {
  description "Manages products, orders, and users"
  metadata {
    needs ["Dashboard", "Bulk operations", "Reporting"]
  }
}

2. Secondary Users

Users who use the system indirectly:

SupportAgent = person "Support Agent" {
  description "Helps customers with orders"
  metadata {
    needs ["Order history", "Customer lookup", "Quick access"]
  }
}

3. Business Stakeholders

Decision-makers and business owners:

ProductManager = person "Product Manager" {
  description "Owns product strategy"
  metadata {
    needs ["Analytics", "User feedback", "Feature usage"]
  }
}

BusinessOwner = person "Business Owner" {
  description "Responsible for revenue and profit"
  metadata {
    needs ["Sales reports", "Conversion metrics", "ROI data"]
  }
}

4. Technical Stakeholders

People who build and maintain the system:

Developer = person "Developer" {
  description "Builds and maintains the system"
  metadata {
    needs ["API documentation", "Clear architecture", "Debugging tools"]
  }
}

DevOpsEngineer = person "DevOps Engineer" {
  description "Deploys and operates the system"
  metadata {
    needs ["Monitoring", "Logs", "Health checks"]
  }
}

5. Compliance & Governance

People ensuring rules are followed:

ComplianceOfficer = person "Compliance Officer" {
  description "Ensures regulatory compliance"
  metadata {
    needs ["Audit logs", "Data privacy controls", "Access logs"]
  }
}

SecurityAuditor = person "Security Auditor" {
  description "Reviews security posture"
  metadata {
    needs ["Security reports", "Vulnerability assessments", "Penetration test results"]
  }
}

Modeling Stakeholders in Sruja

Basic Stakeholder

Customer = person "Customer"

With Details

Customer = person "Customer" {
  description "End users who purchase products"
  metadata {
    tags ["primary-user", "external"]
    priority "high"
    needs [
      "Fast and easy checkout",
      "Product search and filtering",
      "Order tracking"
    ]
    pain_points [
      "Complex forms",
      "Slow page loads",
      "Lack of mobile support"
    ]
  }
}

With Relationships

Customer = person "Customer"
Shop = system "Shop"

// Customer relationship shows interaction
Customer -> Shop "Purchases products"
Shop -> Customer "Sends order updates"

Stakeholder Interactions

Example: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'

// Stakeholders
Customer = person "Customer"
Administrator = person "Administrator"
SupportAgent = person "Support Agent"
ProductManager = person "Product Manager"
BusinessOwner = person "Business Owner"

// System
Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "Database"
}

// Stakeholder interactions
Customer -> Shop.WebApp "Browses products"
Customer -> Shop.WebApp "Purchases products"
Customer -> Shop.WebApp "Tracks orders"

Administrator -> Shop.WebApp "Manages products"
Administrator -> Shop.WebApp "Views reports"

SupportAgent -> Shop.WebApp "Assists customers"
SupportAgent -> Shop.WebApp "Views order history"

ProductManager -> Shop.WebApp "Reviews analytics"
ProductManager -> Shop.WebApp "Monitors user behavior"

BusinessOwner -> Shop.WebApp "Views revenue reports"
BusinessOwner -> Shop.WebApp "Monitors KPIs"

view index {
  include *
}

Stakeholder Matrix

StakeholderRoleNeedsConcerns
CustomerEnd userFast checkout, easy searchPrivacy, reliability
AdministratorSystem adminManagement tools, reportsEase of use
Support AgentCustomer supportCustomer data, order historyQuick access
Product ManagerProduct ownerAnalytics, user feedbackFeature adoption
Business OwnerDecision makerRevenue, conversion, ROIProfitability, costs

Prioritizing Stakeholders

MoSCoW Method

// Must Have (Critical)
Customer = person "Customer"

// Should Have (Important)
Administrator = person "Administrator"

// Could Have (Nice to have)
ProductManager = person "Product Manager"

// Won't Have (Out of scope)
MarketingAnalyst = person "Marketing Analyst"

RICE Scoring (for Features)

// High impact, low effort
Customer = person "Customer" {
  metadata {
    rice_score {
      reach "10000 users"
      impact "High"
      confidence "80%"
      effort "2 weeks"
    }
  }
}

Stakeholder Personas

Creating Personas

// Persona 1: Primary user
Sarah = person "Sarah (Customer)" {
  description "Busy professional, 35, shops on mobile"
  metadata {
    goals ["Fast checkout", "Mobile-friendly", "Easy returns"]
    frustrations ["Complex forms", "Slow loading"]
    context ["Shops during commute", "Uses iPhone"]
  }
}

// Persona 2: Secondary user
John = person "John (Administrator)" {
  description "Operations manager, 45, manages inventory"
  metadata {
    goals ["Quick updates", "Bulk operations", "Real-time data"]
    frustrations ["Slow interface", "Lack of filters"]
    context ["Desktop user", "Excel background"]
  }
}

Stakeholder-System Relationships

Direct Users

// Interact directly with the system
Customer -> Shop.WebApp "Uses"
Administrator -> Shop.WebApp "Manages"

Indirect Users

// System serves their needs without direct interaction
BusinessOwner -> Shop "Reviews revenue"
// Business owner doesn't use the app directly

External Stakeholders

// Affected by system but don't use it
ComplianceOfficer = person "Compliance Officer"
ComplianceOfficer -> Shop "Reviews audit logs"

Documenting Stakeholder Needs

Using Requirements

// Customer needs
R1 = requirement functional "Guest checkout available"
R2 = requirement performance "Page load < 2s"

// Administrator needs
R3 = requirement functional "Bulk product upload"
R4 = requirement usability "Intuitive dashboard"

// Business owner needs
R5 = requirement reporting "Daily revenue reports"
R6 = requirement analytics "Conversion funnel tracking"

Using SLOs

Shop = system "Shop" {
  slo {
    availability {
      target "99.9%"
    }
    latency {
      p95 "200ms"
      p99 "500ms"
    }
    errorRate {
      target "0.1%"
    }
  }
}

Exercise

Identify stakeholders for a hospital appointment scheduling system:

"The system allows patients to book appointments, doctors to view their schedules, and receptionists to manage appointments. Hospital administrators need reporting on utilization. The system integrates with insurance APIs and sends SMS reminders."

Identify:

  1. Primary stakeholders: _
  2. Secondary stakeholders: _
  3. Business stakeholders: _
  4. Technical stakeholders: _

For each, document:

  • Their role
  • What they need from the system
  • Any concerns or pain points

Key Takeaways

  1. Identify all stakeholders: Users, business, technical, compliance
  2. Categorize them: Primary, secondary, business, technical
  3. Document needs: Goals, frustrations, context
  4. Prioritize: Focus on most important stakeholders
  5. Model relationships: Show how stakeholders interact with the system
  6. Use personas: Create detailed user profiles

Next Lesson

In Lesson 3, you'll learn how to document dependencies, constraints, and success criteria.

Lesson 3: Dependencies and Constraints

Learning Goals

  • Identify and document external dependencies
  • Define constraints and limitations
  • Capture success criteria and SLOs
  • Use Sruja features to model context

External Dependencies

What Are Dependencies?

Dependencies are external systems, services, or resources your system relies on to function.

// Your system
Shop = system "Shop"

// External dependencies
PaymentGateway = system "Payment Gateway"
EmailService = system "Email Service"
AnalyticsService = system "Analytics Service"
CDN = system "Content Delivery Network"

Categorizing Dependencies

Critical Dependencies

System cannot function without:

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "critical", "pci-compliant"]
    sla "99.9% uptime"
    impact "If down, checkout fails"
  }
}

Important Dependencies

System can function but degraded:

EmailService = system "Email Service" {
  metadata {
    tags ["external", "important"]
    sla "99.0% uptime"
    impact "If down, notifications delayed but system works"
  }
}

Optional Dependencies

System works fine without:

AnalyticsService = system "Analytics Service" {
  metadata {
    tags ["external", "optional"]
    sla "98.0% uptime"
    impact "If down, analytics lost but core functionality works"
  }
}

Documenting Dependencies

Using Metadata

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "vendor"]
    owner "Stripe Inc."
    sla "99.9% uptime"
    mttr "4 hours"
    contact "support@stripe.com"
    fallback "Manual payment processing"
    cost "$0.30 per transaction"
    compliance ["PCI-DSS Level 1"]
  }
}

Using Relationships

// Shows dependency
Shop.API -> PaymentGateway "Process payment" [critical]
Shop.API -> EmailService "Send notifications" [important]
Shop.API -> AnalyticsService "Track events" [optional]

Using Fallbacks

// Primary provider
PrimaryPayment = system "Primary Payment Gateway"
BackupPayment = system "Backup Payment Gateway"

Shop.API -> PrimaryPayment "Process payment" [primary]
Shop.API -> BackupPayment "Process payment" [fallback]

Constraints

What Are Constraints?

Constraints are limitations that affect your design and implementation choices.

Types of Constraints

Technical Constraints

Shop = system "Shop" {
  metadata {
    technical_constraints {
      "Must use PostgreSQL for transactions",
      "Must support 10k concurrent users",
      "Maximum API response time: 2s",
      "Must be deployable to AWS"
    }
  }
}

Business Constraints

Shop = system "Shop" {
  metadata {
    business_constraints {
      "Launch date: Q4 2024",
      "Budget: $500k/year infrastructure",
      "Team size: 3 engineers",
      "Must support multi-currency"
    }
  }
}

Compliance Constraints

Shop = system "Shop" {
  metadata {
    compliance_constraints {
      "PCI-DSS Level 1 for payments",
      "GDPR for EU customer data",
      "CCPA for California customers",
      "SOX compliance for financial reporting"
    }
  }
}

Security Constraints

Shop = system "Shop" {
  metadata {
    security_constraints {
      "All data encrypted at rest",
      "All API calls authenticated",
      "No PII in logs",
      "Minimum TLS 1.3"
    }
  }
}

Success Criteria

What Is Success?

Define what "good" looks like for your system.

Using overview Block

overview {
  summary "E-commerce platform for online retail"
  audience "Customers, administrators, business owners"

  goals [
    "Fast and reliable checkout",
    "Easy product discovery",
    "Real-time inventory tracking",
    "Scalable to 10k concurrent users"
  ]

  non_goals [
    "Social features",
    "Mobile app (web-only)",
    "Marketplace (first-party only)"
  ]

  success_criteria [
    "Checkout completion rate > 80%",
    "Average checkout time < 2 minutes",
    "Page load time < 2s",
    "Customer satisfaction > 4.5/5"
  ]
}

Using slo Block

Shop = system "Shop" {
  slo {
    availability {
      target "99.9%"
      window "30 days"
    }

    latency {
      p95 "200ms"
      p99 "500ms"
      window "7 days"
    }

    errorRate {
      target "0.1%"
      window "7 days"
    }

    throughput {
      target "10000 req/s"
      window "peak hour"
    }
  }
}

Complete Context Example

import { * } from 'sruja.ai/stdlib'

// OVERVIEW
overview {
  summary "E-commerce platform for online retail"
  audience "Customers, administrators, business owners"
  scope "Shopping, checkout, order management, inventory"
  goals [
    "Fast and reliable checkout",
    "Real-time inventory",
    "Scalable architecture"
  ]
  non_goals [
    "Social features",
    "Marketplace"
  ]
  risks [
    "Payment gateway downtime",
    "Database scaling limits",
    "High traffic surges"
  ]
}

// STAKEHOLDERS
Customer = person "Customer"
Administrator = person "Administrator"
BusinessOwner = person "Business Owner"
SupportAgent = person "Support Agent"

// SYSTEM
Shop = system "Shop" {
  metadata {
    team ["platform-team"]
    budget "$500k/year"
    launch_date "2024-12-01"
  }

  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "PostgreSQL"
  Cache = database "Redis"

  // CONSTRAINTS
  constraints {
    "Must support 10k concurrent users",
    "Maximum response time: 2s",
    "PCI-DSS Level 1 compliance",
    "All data encrypted at rest"
  }

  // SUCCESS CRITERIA (SLOs)
  slo {
    availability {
      target "99.9%"
      window "30 days"
    }

    latency {
      p95 "200ms"
      p99 "500ms"
    }
  }
}

// DEPENDENCIES
PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external", "critical", "vendor"]
    owner "Stripe Inc."
    sla "99.9% uptime"
    mttr "4 hours"
    cost "$0.30/transaction"
    compliance ["PCI-DSS Level 1"]
  }
}

EmailService = system "Email Service" {
  metadata {
    tags ["external", "important", "vendor"]
    owner "SendGrid"
    sla "99.0% uptime"
    fallback "Queue for later"
  }
}

AnalyticsService = system "Analytics Service" {
  metadata {
    tags ["external", "optional", "vendor"]
    owner "Google"
    sla "98.0% uptime"
  }
}

// RELATIONSHIPS
Customer -> Shop.WebApp "Purchases products"
Administrator -> Shop.WebApp "Manages system"

Shop.API -> PaymentGateway "Process payment" [critical]
Shop.API -> EmailService "Send notifications" [important]
Shop.API -> AnalyticsService "Track events" [optional]

view index {
  include *
}

Requirements and Constraints

Using requirements

// Functional requirements
R1 = requirement functional "Must support multiple payment methods"
R2 = requirement functional "Guest checkout available"
R3 = requirement functional "Order tracking"

// Non-functional requirements
R4 = requirement performance "Page load time < 2s"
R5 = requirement availability "99.9% uptime"
R6 = requirement scalability "Support 10k concurrent users"

// Security requirements
R7 = requirement security "All data encrypted at rest"
R8 = requirement security "TLS 1.3 for all connections"

// Compliance requirements
R9 = requirement constraint "PCI-DSS Level 1"
R10 = requirement constraint "GDPR compliance for EU users"

Using constraints Block

constraints {
  "All APIs must use HTTPS",
  "Database must be encrypted at rest",
  "No PII in logs",
  "Maximum API response time: 2s",
  "Must support 99.9% uptime",
  "PCI-DSS compliance required"
}

Documenting Trade-offs

Decision Records (ADRs)

ADR001 = adr "Use PostgreSQL for primary database" {
  status "accepted"
  context "Need ACID transactions, strong consistency for orders"
  decision "Use PostgreSQL over MongoDB"
  consequences {
    benefits "Strong consistency, ACID transactions",
    tradeoffs "Scaling requires more effort than NoSQL"
  }
}

ADR002 = adr "Use Stripe for payments" {
  status "accepted"
  context "Need PCI-compliant payment processing"
  decision "Use Stripe over building in-house"
  consequences {
    benefits "PCI compliance, focus on core product",
    tradeoffs "Per-transaction fee, vendor lock-in"
  }
}

Exercise

Document context for a fitness tracking app:

"A fitness tracking app allows users to log workouts and view progress. Users can sync data from wearables. The app integrates with HealthKit and Google Fit. The app must work offline and sync when online. HIPAA compliance required for health data. Target 1 million users by year-end."

Document:

  1. Stakeholders
  2. External dependencies (with criticality)
  3. Constraints (technical, business, compliance)
  4. Success criteria (SLOs)

Key Takeaways

  1. Document dependencies: External systems you rely on
  2. Categorize by importance: Critical, important, optional
  3. Define constraints: Limitations affecting design
  4. Set success criteria: What "good" looks like
  5. Capture trade-offs: ADRs for important decisions

Module 6 Complete

You've completed Context! You now understand:

  • What context is and why it matters
  • How to model stakeholders
  • How to document dependencies and constraints

🎉 Course Complete!

You've finished Systems Thinking 101! You now understand:

  • ✅ Fundamentals of systems thinking
  • ✅ Parts and relationships
  • ✅ Boundaries
  • ✅ Flows
  • ✅ Feedback loops
  • ✅ Context

What's Next?

Congratulations on completing the course! 🚀

System Design 101


title: "System Design 101: Fundamentals" summary: "Master the art of designing scalable, reliable, and maintainable systems. From load balancers to caching strategies, learn the building blocks of modern architecture." weight: 1

System Design 101: Fundamentals

Note

Who is this for? Developers moving into senior roles, students preparing for interviews, or anyone curious about how massive systems like Netflix or Uber work.

Why Learn System Design?

Writing code is only half the battle. As you grow in your career, the challenges shift from "how do I write this function?" to "how do I ensure this system handles 10 million users?".

System design is the skill of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. It's about making the right trade-offs.

What You Will Learn

By the end of this course, you will be able to:

  1. Speak the Language: Confidently use terms like latency, throughput, consistency, and availability.
  2. Use the Toolbox: Know when to use a relational database vs. NoSQL, or when to introduce a cache or message queue.
  3. Draw the Blueprint: Visualise your ideas using industry-standard diagrams (C4 model).
  4. Scale for Success: Understand how to take a system from 1 user to 1,000,000 users.

Course Structure

This course is broken down into digestible modules:

Module 1: Core Concepts

The foundational pillars of distributed systems. We cover Scalability (Vertical vs Horizontal), Reliability, and Maintainability.

Module 2: The Building Blocks

A deep dive into the components that make up a system:

  • Load Balancers: The traffic cops of the internet.
  • Databases: SQL vs NoSQL, replication, and sharding.
  • Caches: speeding up access with Redis/Memcached.
  • Message Queues: Decoupling services with Kafka/RabbitMQ.

Module 3: Architectural Patterns

How to organize code and services:

  • Monolith vs Microservices
  • Event-Driven Architecture
  • API Gateway Pattern

Module 4: The Interview Guide

Practical tips for acing the system design interview, including a framework for tackling open-ended problems.

Prerequisites

  • Basic understanding of how the web works (HTTP, DNS, Client-Server).
  • Familiarity with at least one programming language.
  • No prior distributed systems knowledge required.

Let's Begin

Start your journey with Module 1: Fundamentals.

Fundamentals


title: "Module 1: Fundamentals" weight: 0 summary: "Lay the groundwork. Learn the language of system design, how to dissect problems, and master the art of trade-offs."

Module 1: Fundamentals

Tip

The Interview Secret: Most candidates fail not because they don't know the tech, but because they dive into solutions too early. This module fixes that.

What's Inside?

This isn't just theory. It's the playbook for how senior engineers approach broad, ambiguous problems.

  1. What is System Design?: Defining the game we're playing.
  2. The Art of Requirements: How to extract the real problem from a vague prompt.
  3. The C4 Model: A standardized way to draw your ideas so others actually understand them.
  4. Trade-offs: Why "it depends" is the only correct answer (and how to explain what it depends on).
  5. Sruja Basics: Your first architecture-as-code model.

Learning Goals

By the end of this module, you will be able to:

  • Distinguish between Functional and Non-Functional Requirements.
  • Calculate rough capacity estimates (back-of-the-envelope math).
  • Draw a high-level System Context diagram.
  • Explain the CAP theorem in plain English.

Ready?

Let's start your journey with Lesson 1: What is System Design?


title: "Lesson 1: The Mindset" weight: 1 summary: "Stop coding. Start designing. Why system design is different from application development." learning_objectives:

  • Understand the difference between coding and system design
  • Learn the 'House vs Skyscraper' analogy
  • Master the Functional vs Non-Functional distinction estimated_time: "15 minutes" difficulty: "beginner"

Lesson 1: The Mindset

The Shift

When you write code, you are building a single room. You care about the furniture (variables), the flow (logic), and the usability (UI).

System Design is city planning.

You stop caring about the furniture in every room. Instead, you care about:

  • Traffic flow: Can the roads handle rush hour? (Throughput)
  • Utilities: Is there enough water and electricity? (Capacity)
  • Disaster recovery: What happens if the power plant explodes? (Reliability)
  • Expansion: Can we add a new suburb next year? (Scalability)

Important

The golden rule: In system design, there are no right answers, only trade-offs.

Functional vs Non-Functional

Every system has two sets of requirements. In an interview (and real life), 90% of your initial grade comes from clarifying these before you draw a single box.

1. Functional Requirements (The "What")

These are the features. If the system doesn't do this, it's useless.

  • User can post a tweet.
  • User can follow others.
  • User sees a news feed.

2. Non-Functional Requirements (The "How")

These are the constraints. If the system doesn't meet these, it will crash/fail/be too slow.

  • Scalability: Must handle 100M daily active users.
  • Latency: Feed must load in < 200ms.
  • Consistency: A tweet must appear on followers' feeds within 5 seconds.
graph TD
    A[Requirements] --> B[Functional]
    A --> C[Non-Functional]
    B --> B1[Features]
    B --> B2[APIs]
    B --> B3[User Flows]
    C --> C1[Scalability]
    C --> C2[Reliability]
    C --> C3[Latency]
    C --> C4[Cost]

    style A fill:#f9f,stroke:#333
    style B fill:#bbf,stroke:#333
    style C fill:#bfb,stroke:#333

The "It Depends" Game

Junior engineers search for the "best" database. Senior engineers ask "what are we optimizing for?"

You Optimize ForYou Might SacrificeExample
ConsistencyAvailabilityBanking (Balances must be correct, even if system goes down briefly)
AvailabilityConsistencySocial Media (Better to show old likes than an error page)
Write SpeedRead SpeedLogging (Write fast, read rarely)
Development SpeedPerformanceStartups (Ship Python/Ruby MVP fast, rewrite later)

Sruja Integration

In Sruja, we treat requirements as code. This keeps your constraints right next to your architecture.

Why Kinds and Types Matter

In Sruja, you declare kinds to establish the vocabulary of your architecture. This isn't just syntax—it provides real benefits:

  1. Early Validation: If you typo an element type (e.g., sytem instead of system), Sruja catches it immediately.
  2. Better Tooling: IDEs can provide autocomplete and validation based on your declared kinds.
  3. Self-Documentation: Anyone reading your model knows exactly which element types are available.
  4. Custom Vocabulary: You can define your own kinds (e.g., microservice = kind "Microservice") to match your domain.
  5. Flat and Clean: With Sruja's flat syntax, these declarations live at the top of your file—no specification wrapper block required.

Example: Requirements-Driven Architecture

import { * } from 'sruja.ai/stdlib'

// 1. Defining the "What" (Functional)
requirement R1 functional "Users can post short text messages (tweets)"

// 2. Defining the "How" (Non-Functional)
requirement R2 performance "500ms p95 latency for reading timeline"
requirement R3 scale "Store 5 years of tweets (approx 1PB)"
requirement R4 availability "99.9% uptime SLA"

// 3. The Architecture follows the requirements
Twitter = system "The Platform" {
    description "Satisfies R1, R2, R3, R4"

    TimelineAPI = container "Timeline API" {
        technology "Rust"
        description "Satisfies R2 - optimized for low latency"

        slo {
            latency {
                p95 "500ms"
                window "7 days"
            }
            availability {
                target "99.9%"
                window "30 days"
            }
        }
    }

    TweetDB = database "Tweet Storage" {
        technology "Cassandra"
        description "Satisfies R3 - distributed storage for 1PB scale"
    }

    TimelineAPI -> TweetDB "Reads/Writes"
}

// 4. Document the decision
ADR001 = adr "Use Cassandra for tweet storage" {
    status "Accepted"
    context "Need to store 1PB of tweets with high write throughput"
    decision "Use Cassandra for distributed, scalable storage"
    consequences "Excellent scalability, eventual consistency trade-off"
}

view index {
title "Twitter Platform Overview"
include *
}

// Performance-focused view
view performance {
title "Performance View"
include Twitter.TimelineAPI Twitter.TweetDB
}

Knowledge Check

Q: My boss says "We need to handle infinite users". How do you respond?

Bad Answer: "Okay, I'll use Kubernetes and sharding."

Senior Answer: "Infinite is expensive. Do we expect 1k users or 100M users? The design for 1k costs $50/mo. The design for 100M costs $50k/mo. Let's define a realistic target for the next 12 months."

Q: Why not just use the fastest database for everything?

Because "fastest" depends on the workload. A database fast at reading (Cassandra) might be complex to manage. A database fast at relationships (Neo4j) might scale poorly for heavy writes. Trade-offs.

Next Steps

Now that we have the mindset, let's learn the language. 👉 Lesson 2: The Vocabulary of Scale


title: "Lesson 2: The Vocabulary of Scale" weight: 2 summary: "Vertical vs. Horizontal Scaling, Latency vs. Throughput. The words you need to know." learning_objectives:

  • Explain Vertical vs Horizontal scaling
  • Understand why distributed systems are hard
  • Master the difference between Latency and Throughput estimated_time: "15 minutes" difficulty: "beginner"

Lesson 2: The Vocabulary of Scale

To design big systems, you need to speak the language.

1. Scaling: Up vs Out

When your website crashes because too many people are using it, you have two choices.

Vertical Scaling (Scaling Up)

"Get a bigger machine." You upgrade from a 4GB RAM server to a 64GB RAM server.

  • Pros: Easy. No code changes.
  • Cons: Expensive. Finite limit (you can't buy a 100TB RAM server... easily). Single point of failure.

Horizontal Scaling (Scaling Out)

"Get more machines." You buy 10 cheap servers and split the traffic between them.

  • Pros: Infinite scale (google has millions of servers). Resilient (if one dies, others take over).
  • Cons: Complex. You need load balancers and data consistency strategies.
graph TD
    subgraph Vertical [Vertical Scaling]
        Small[Server] -- Upgrade --> Big[SERVER]
    end

    subgraph Horizontal [Horizontal Scaling]
        One[Server] -- Add More --> Many1[Server]
        One -- Add More --> Many2[Server]
        One -- Add More --> Many3[Server]
    end

2. Speed: Latency vs Throughput

In interviews, never just say "it needs to be fast". Be specific.

  • Latency: The time it takes for one person to get a result.
    • Metaphor: The time it takes to drive from A to B.
    • Unit: Milliseconds (ms).
  • Throughput: The number of people the system can serve at the same time.
    • Metaphor: The width of the highway (how many cars per hour).
    • Unit: Requests per Second (RPS).

Tip

Use the right word: A system can have low latency (fast response) but low throughput (crashes if 5 people use it). A highway can have high throughput (10 lanes) but high latency (traffic jam).

3. Sruja in Action

Sruja allows you to define horizontal scaling requirements explicitly using the scale block.

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce System" {
    WebServer = container "Web App" {
        technology "Rust, Axum"

        // Explicitly defining Horizontal Scaling
        scale {
            min 3            // Start with 3 servers
            max 100          // Scale up to 100
            metric "cpu > 80%"
        }
    }

    Database = database "Primary DB" {
        technology "PostgreSQL"
        // Describing Vertical Scaling via comments/description
        description "Running on a massive AWS r5.24xlarge instance (Vertical Scaling)"
    }

    WebServer -> Database "Reads/Writes"
}

view index {
include *
}

Knowledge Check

Q: Why don't we just vertically scale forever?

Because physics. There is a limit to how fast a single CPU can be. Also, if that one super-computer catches fire, your entire business is dead.

Next Steps

We have the mindset, and we have the words. Now let's draw. 👉 Lesson 3: The C4 Model (Visualizing Architecture)

Lesson 3


title: "Lesson 3: Availability & Reliability" weight: 3 summary: "Redundancy, Failover, and SLAs."

Lesson 3: Availability & Reliability

Reliability vs. Availability

  • Reliability: The probability that a system will function correctly without failure for a specified period. It's about correctness.
  • Availability: The percentage of time a system is operational and accessible. It's about uptime.

A system can be available but not reliable (e.g., it returns 500 errors but is "up").

Measuring Availability

Availability is often measured in "nines":

AvailabilityDowntime per Year
99% (Two nines)3.65 days
99.9% (Three nines)8.76 hours
99.99% (Four nines)52.6 minutes
99.999% (Five nines)5.26 minutes

Achieving High Availability

Redundancy

The key to availability is eliminating Single Points of Failure (SPOF). This is done via redundancy.

  • Active-Passive: One server handles traffic; the other is on standby.
  • Active-Active: Both servers handle traffic. If one fails, the other takes over the full load.

Failover

The process of switching to a redundant system upon failure. This can be manual or automatic.


🛠️ Sruja Perspective: Modeling Redundancy

You can explicitly model redundant components in Sruja to visualize your high-availability strategy.

import { * } from 'sruja.ai/stdlib'


Payments = system "Payment System" {
    PaymentService = container "Payment Service" {
        technology "Java"
    }

    // Modeling a primary and standby database
    PrimaryDB = database "Primary Database" {
        technology "MySQL"
        tags ["primary"]
    }

    StandbyDB = database "Standby Database" {
        technology "MySQL"
        tags ["standby"]
        description "Replicates from PrimaryDB. Promoted to primary if PrimaryDB fails."
    }

    PaymentService -> PrimaryDB "Reads/Writes"
    PrimaryDB -> StandbyDB "Replicates data"
}

view index {
include *
}

Lesson 4


title: "Lesson 4: CAP Theorem & Consistency" weight: 4 summary: "Consistency, Availability, and Partition Tolerance."

Lesson 4: CAP Theorem & Consistency

The CAP Theorem

Proposed by Eric Brewer, the CAP theorem states that a distributed data store can only provide two of the following three guarantees:

  1. Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

The Reality: P is Mandatory

In a distributed system, network partitions (P) are inevitable. Therefore, you must choose between Consistency (CP) and Availability (AP) when a partition occurs.

  • CP (Consistency + Partition Tolerance): Wait for data to sync. If a node is unreachable, return an error. (e.g., Banking systems).
  • AP (Availability + Partition Tolerance): Return the most recent version of data available, even if it might be stale. (e.g., Social media feeds).

Consistency Models

  • Strong Consistency: Once a write is confirmed, all subsequent reads see that value.
  • Eventual Consistency: If no new updates are made, eventually all accesses will return the last updated value. (Common in AP systems).

🛠️ Sruja Perspective: Documenting Guarantees

When defining data stores in Sruja, it is helpful to document their consistency guarantees, especially for distributed databases.

import { * } from 'sruja.ai/stdlib'


DataLayer = system "Data Layer" {
    UserDB = database "User Database" {
        technology "Cassandra"
        // Explicitly stating the consistency model
        description "configured with replication factor 3. Uses eventual consistency for high availability."

        // You could also use custom tags
        tags ["AP-System", "Eventual-Consistency"]
    }

    BillingDB = database "Billing Database" {
        technology "PostgreSQL"
        description "Single primary with synchronous replication to ensure strong consistency."
        tags ["CP-System", "Strong-Consistency"]
    }
}

view index {
include *
}

Lesson 5


title: "Lesson 5: User Scenarios" weight: 5 summary: "Modeling user flows and interactions."

Lesson 5: User Scenarios

Understanding User Journeys

A User Scenario describes the series of steps a user takes to achieve a specific goal within your system. While static architecture diagrams show structure, user scenarios show behavior.

Why Model Scenarios?

  1. Validation: Ensures that all components required for a feature actually exist and are connected.
  2. Clarity: Helps stakeholders understand how the system works from a user's perspective.
  3. Testing: Serves as a blueprint for integration and end-to-end tests.

Example Scenario: Buying a Ticket

  1. User searches for events.
  2. User selects a ticket.
  3. User enters payment details.
  4. System processes payment.
  5. System sends confirmation email.

🛠️ Sruja Perspective: Modeling Scenarios

Sruja provides a dedicated scenario keyword to model these interactions explicitly. This allows you to visualize the flow of data across your defined architecture.

import { * } from 'sruja.ai/stdlib'


R1 = requirement functional "User can buy a ticket"
R2 = requirement performance "Process payment in < 2s"

// Define the actors and systems first
User = person "Ticket Buyer"

TicketingApp = system "Ticketing Platform" {
    WebApp = container "Web Frontend"
    PaymentService = container "Payment Processor"
    EmailService = container "Notification Service"

    WebApp -> PaymentService "Process payment"
    PaymentService -> EmailService "Trigger confirmation"
}

// Define the scenario
BuyTicket = scenario "User purchases a concert ticket" {
    User -> TicketingApp.WebApp "Selects ticket"
    TicketingApp.WebApp -> TicketingApp.PaymentService "Process payment"
    TicketingApp.PaymentService -> TicketingApp.EmailService "Trigger confirmation"
    TicketingApp.EmailService -> User "Send email"
}

view index {
include *
}

By defining scenarios, you can automatically generate sequence diagrams or flowcharts that map directly to your code.

Building Blocks

Lesson 1


title: "Lesson 1: Load Balancers" weight: 1 summary: "L4 vs L7 Load Balancing, Algorithms."

Lesson 1: Load Balancers

What is a Load Balancer?

A load balancer sits between clients and servers, distributing incoming network traffic across a group of backend servers. This ensures that no single server bears too much load.

Types of Load Balancing

Layer 4 (Transport Layer)

  • Decisions based on IP address and TCP/UDP ports.
  • Faster, less CPU intensive.
  • Does not inspect the content of the request.

Layer 7 (Application Layer)

  • Decisions based on the content of the message (URL, HTTP headers, cookies).
  • Can route traffic to different services based on URL (e.g., /images to image servers).
  • More CPU intensive but smarter.

Algorithms

  • Round Robin: Requests are distributed sequentially.
  • Least Connections: Sends request to the server with the fewest active connections.
  • IP Hash: The client's IP address is used to determine which server receives the request (useful for session stickiness).

🛠️ Sruja Perspective: Modeling Load Balancers

In Sruja, a load balancer is typically modeled as a container or component that sits in front of your application servers.

import { * } from 'sruja.ai/stdlib'


LB = container "Nginx Load Balancer" {
    technology "Nginx"
    tags ["load-balancer"]
    description "Layer 7 load balancer routing traffic based on URL paths."
}

AppServer = container "App Server" {
    technology "Python, Django"
    tags ["scaled"]
}

// Traffic flow
LB -> AppServer "Distributes requests (Round Robin)"

view index {
include *
}

Lesson 2


title: "Lesson 2: Databases" weight: 2 summary: "SQL vs NoSQL, Replication, and Sharding."

Lesson 2: Databases

SQL vs. NoSQL

SQL (Relational Databases)

  • Structure: Structured data with predefined schemas (Tables, Rows, Columns).
  • Query Language: SQL (Structured Query Language).
  • ACID Compliance: Strong guarantees for Atomicity, Consistency, Isolation, Durability.
  • Examples: MySQL, PostgreSQL, Oracle.
  • Best for: Complex queries, financial transactions.

NoSQL (Non-Relational Databases)

  • Structure: Flexible schemas (Key-Value, Document, Graph, Column-Family).
  • Scalability: Designed for horizontal scaling.
  • Examples: MongoDB (Document), Redis (Key-Value), Cassandra (Column).
  • Best for: Rapidly changing data, massive scale, unstructured data.

Scaling Databases

Replication

Copying data to multiple servers.

  • Master-Slave: Writes go to Master, Reads go to Slaves. Good for read-heavy systems.
  • Master-Master: Writes can go to any node. Complex conflict resolution needed.

Sharding

Partitioning data across multiple servers (e.g., Users A-M on Server 1, N-Z on Server 2).

  • Pros: Handles massive data volumes.
  • Cons: Complex joins, rebalancing data is hard.

🛠️ Sruja Perspective: Modeling Databases

Sruja allows you to define the type of database and its role in the system.

import { * } from 'sruja.ai/stdlib'


UserDB = container "User Database" {
    technology "PostgreSQL"
    tags ["relational", "primary"]
    description "Stores user profiles and authentication data."
}

SessionStore = container "Session Cache" {
    technology "Redis"
    tags ["key-value", "cache"]
    description "Stores active user sessions for fast access."
}

view index {
include *
}

Lesson 3


title: "Lesson 3: Caching" weight: 3 summary: "Caching strategies and eviction policies."

Lesson 3: Caching

Why Cache?

Caching is the process of storing copies of data in a temporary storage location (cache) so that future requests for that data can be served faster.

  • Reduce Latency: Memory is faster than disk.
  • Reduce Load: Fewer queries to the database.

Caching Strategies

Cache-Aside (Lazy Loading)

  1. App checks cache.
  2. If miss, App reads from DB.
  3. App writes to cache.
  • Pros: Only requested data is cached.
  • Cons: Initial request is slow (cache miss).

Write-Through

  1. App writes to cache and DB simultaneously.
  • Pros: Data in cache is always fresh.
  • Cons: Slower writes.

Write-Back (Write-Behind)

  1. App writes to cache only.
  2. Cache writes to DB asynchronously.
  • Pros: Fast writes.
  • Cons: Data loss risk if cache fails before syncing.

Eviction Policies

When the cache is full, what do you remove?

  • LRU (Least Recently Used): Remove the item that hasn't been used for the longest time.
  • LFU (Least Frequently Used): Remove the item used least often.
  • FIFO (First In, First Out): Remove the oldest item.

🛠️ Sruja Perspective: Modeling Caches

In Sruja, caches are often modeled as separate containers or components.

import { * } from 'sruja.ai/stdlib'


Catalog = system "Product Catalog System" {
    WebApp = container "Storefront" {
        technology "Node.js"
    }

    ProductCache = container "Product Cache" {
        technology "Memcached"
        description "Caches product details using LRU eviction."
    }

    ProductDB = container "Product Database" {
        technology "MongoDB"
    }

    WebApp -> ProductCache "Read (Cache-Aside)"
    WebApp -> ProductDB "Read on Miss"
}

view index {
include *
}

Advanced Modeling

Lesson 1


title: "Lesson 1: Microservices Architecture" weight: 1 summary: "Monolith vs Microservices, Service Boundaries."

Lesson 1: Microservices Architecture

Monolith vs. Microservices

Monolithic Architecture

A single application where all functionality is packaged together.

  • Pros: Simple to develop, deploy, and test initially.
  • Cons: Hard to scale specific parts, tight coupling, single point of failure.

Microservices Architecture

A collection of small, independent services that communicate over a network.

  • Pros: Independent scaling, technology diversity, fault isolation.
  • Cons: Distributed system complexity, network latency, data consistency challenges.

Defining Service Boundaries

The hardest part of microservices is deciding where to draw the lines. Common strategies include:

  • Business Capability: Group by what the business does (e.g., Billing, Shipping).
  • Subdomain: Group by Domain-Driven Design (DDD) subdomains.

🛠️ Sruja Perspective: Modeling Microservices

In Sruja, microservices are typically modeled as separate container items within a system, or even as separate system items if they are large enough.

Basic Example

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

OrderSystem = system "Order Management" {
    OrderService = container "Order Service" {
        technology "Rust"
        description "Handles order placement and tracking."
    }
    OrderDB = database "Order Database" {
        technology "PostgreSQL"
    }
    OrderService -> OrderDB "Reads/Writes"
}

InventorySystem = system "Inventory Management" {
    InventoryService = container "Inventory Service" {
        technology "Java"
        description "Tracks stock levels."
    }
    InventoryDB = database "Inventory Database" {
        technology "PostgreSQL"
    }
    InventoryService -> InventoryDB "Reads/Writes"
}

// Inter-service communication
Customer -> OrderSystem.OrderService "Places order"
OrderSystem.OrderService -> InventorySystem.InventoryService "Reserves stock"

// Requirements drive architecture
requirement R1 functional "Must handle 10k orders/day"
requirement R2 performance "Order placement < 500ms"
requirement R3 scalability "Scale order processing independently"

// Document decisions
adr ADR001 "Split into microservices" {
    status "Accepted"
    context "Need independent scaling for order vs inventory"
    decision "Separate OrderSystem and InventorySystem"
    consequences "Better scalability, network latency overhead"
}

view index {
title "System Overview"
include *
}

// Developer perspective: Focus on services and APIs
view developer {
title "Developer View - Service Architecture"
include OrderSystem OrderSystem.OrderService OrderSystem.OrderDB
include InventorySystem InventorySystem.InventoryService InventorySystem.InventoryDB
exclude Customer
}

// Product perspective: Focus on user experience
view product {
title "Product View - User Journey"
include Customer
include OrderSystem
exclude InventorySystem InventorySystem.InventoryDB
}

// Data flow perspective: Show data dependencies
view dataflow {
title "Data Flow View"
include OrderSystem.OrderService OrderSystem.OrderDB
include InventorySystem.InventoryService InventorySystem.InventoryDB
exclude Customer
}

Key Benefits of Multiple Views

  1. Different Audiences: Developers see technical details, product managers see user flows
  2. Reduced Complexity: Each view focuses on what matters for that perspective
  3. Better Communication: Stakeholders get diagrams tailored to their needs
  4. Documentation: Multiple views serve as different types of documentation

Lesson 2


title: "Lesson 2: Event-Driven Architecture" weight: 2 summary: "Pub/Sub, Message Queues, Event Sourcing."

Lesson 2: Event-Driven Architecture

Synchronous vs. Asynchronous

  • Synchronous (Request/Response): Client waits for the server to respond (e.g., HTTP REST).
  • Asynchronous (Event-Driven): Client sends a message and continues work. The receiver processes it later.

Core Concepts

Message Queues (Point-to-Point)

A message is sent to a queue and processed by exactly one consumer.

  • Use Case: Background jobs (e.g., image resizing).

Pub/Sub (Publish/Subscribe)

A message (event) is published to a topic. Multiple subscribers can receive a copy.

  • Use Case: notifying multiple services (e.g., "UserSignedUp" -> EmailService, AnalyticsService).

🛠️ Sruja Perspective: Modeling Events

Sruja supports queue as a first-class citizen to model asynchronous communication.

import { * } from 'sruja.ai/stdlib'


User = person "End User"

Notifications = system "Notification System" {
    AuthService = container "Auth Service" {
        technology "Node.js"
        description "Handles user authentication and publishes events"
    }

    // Define a queue or topic
    UserEvents = queue "User Events Topic" {
        technology "Kafka"
        description "Events related to user lifecycle (signup, login, profile updates)."
    }

    EmailService = container "Email Service" {
        technology "Python"
        description "Sends transactional emails"
    }

    AnalyticsService = container "Analytics Service" {
        technology "Spark"
        description "Processes user events for analytics"
    }

    NotificationDB = database "Notification Database" {
        technology "PostgreSQL"
        description "Stores notification preferences and history"
    }

    // Pub/Sub flow
    User -> AuthService "Signs up"
    AuthService -> UserEvents "Publishes 'UserSignedUp' event"
    UserEvents -> EmailService "Consumes - sends welcome email"
    UserEvents -> AnalyticsService "Consumes - tracks signup"
    EmailService -> NotificationDB "Logs email sent"
}

// Model the event flow as a scenario
UserSignupFlow = scenario "User Signup Event Flow" {
    User -> AuthService "Submits registration"
    AuthService -> UserEvents "Publishes UserSignedUp"
    UserEvents -> EmailService "Triggers welcome email"
    UserEvents -> AnalyticsService "Tracks signup event"
    EmailService -> User "Sends welcome email"
}

// Data flow for analytics processing
flow AnalyticsPipeline "Analytics Data Pipeline" {
    UserEvents -> AnalyticsService "Streams events"
    AnalyticsService -> AnalyticsService "Processes batch"
    AnalyticsService -> AnalyticsService "Generates reports"
}

view index {
title "Notification System Overview"
include *
}

// Event flow view: Focus on async communication
view eventflow {
title "Event Flow View"
include Notifications.AuthService
include Notifications.UserEvents
include Notifications.EmailService
include Notifications.AnalyticsService
exclude User Notifications.NotificationDB
}

// Data view: Focus on data storage
view data {
title "Data Storage View"
include Notifications.EmailService
include Notifications.NotificationDB
include Notifications.AnalyticsService
exclude Notifications.AuthService Notifications.UserEvents
}

Key Concepts

  1. Scenarios model behavioral flows (user journeys, use cases)
  2. Flows model data pipelines (ETL, streaming, batch processing)
  3. Views let you focus on different aspects (events vs data vs user experience)

Lesson 3


title: "Lesson 3: Advanced Scenarios" weight: 3 summary: "Modeling user journeys and technical sequences with Scenarios."

Lesson 3: Advanced Scenarios

In Sruja, use scenario (and its alias story) to model runtime interactions: user journeys and technical sequences.

When to Use Scenarios

  • Modeling user interactions
  • Modeling technical sequences across elements

1. User Flows (Stories)

When modeling a user flow, you focus on the value delivered to the user. Sruja provides the story keyword (an alias for scenario) to make these definitions semantic and clear.

import { * } from 'sruja.ai/stdlib'


User = person "Customer"

Ticketing = system "Ticketing System" {
    WebApp = container "Web Application" {
        technology "React"
    }
    PaymentService = container "Payment Service" {
        technology "Rust"
    }
    TicketDB = database "Ticket Database" {
        technology "PostgreSQL"
    }

    WebApp -> PaymentService "Processes payment"
    PaymentService -> TicketDB "Stores transaction"
}

// High-level user flow
BuyTicket = story "User purchases a ticket" {
    User -> Ticketing.WebApp "Selects ticket"
    Ticketing.WebApp -> Ticketing.PaymentService "Process payment" {
        latency "500ms"
        protocol "HTTPS"
    }
    Ticketing.PaymentService -> User "Sends receipt"
}

view index {
include *
}

Notice how we can add properties like latency and protocol to steps using the { key "value" } syntax. This adds richness to your model without cluttering the diagram.

2. Technical Sequences

When modeling technical sequences, you dive deeper into the architecture, showing how containers and components interact to fulfill a request. You can stick with the scenario keyword here.

import { * } from 'sruja.ai/stdlib'


User = person "End User"

AuthSystem = system "Authentication System" {
    WebApp = container "Web Application" {
        technology "React"
    }
    AuthServer = container "Auth Server" {
        technology "Node.js, OAuth2"
    }
    Database = database "User Database" {
        technology "PostgreSQL"
    }

    WebApp -> AuthServer "Validates tokens"
    AuthServer -> Database "Queries user data"
}

// Detailed technical flow
AuthFlow = scenario "Authentication" {
    User -> AuthSystem.WebApp "Provides credentials"
    AuthSystem.WebApp -> AuthSystem.AuthServer "Validates token"
    AuthSystem.AuthServer -> AuthSystem.Database "Looks up user"
    AuthSystem.Database -> AuthSystem.AuthServer "Returns user data"
    AuthSystem.AuthServer -> AuthSystem.WebApp "Confirms token valid"
    AuthSystem.WebApp -> User "Shows login success"
}

view index {
include *
}

🛠️ Syntax Flexibility

Sruja offers flexible syntax to suit your needs:

Simple Syntax

Great for quick sketches or simple flows.

import { * } from 'sruja.ai/stdlib'


User = person "User"

AuthSystem = system "Auth System" {
    WebApp = container "Web App"
}

LoginFailure = scenario "Login Failure" {
    User -> AuthSystem.WebApp "Enters wrong password"
    AuthSystem.WebApp -> User "Shows error message"
}

view index {
include *
}

Formal Syntax

Better for documentation and referencing. Includes an ID and optional description.

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

ECommerce = system "E-Commerce System" {
    Cart = container "Shopping Cart" {
        technology "React"
    }
    Payment = container "Payment Service" {
        technology "Rust"
    }
}

Inventory = system "Inventory System" {
    InventoryService = container "Inventory Service" {
        technology "Java"
    }
}

Customer -> ECommerce.Cart "Adds items"
ECommerce.Cart -> Inventory.InventoryService "Checks availability"
ECommerce.Cart -> ECommerce.Payment "Processes payment"

Checkout = scenario "Checkout Process" {
    description "The complete checkout flow including payment and inventory check."

    Customer -> ECommerce.Cart "Initiates checkout"
    ECommerce.Cart -> Inventory.InventoryService "Reserves items"
    Inventory.InventoryService -> ECommerce.Cart "Confirms reserved"
    ECommerce.Cart -> ECommerce.Payment "Charges payment"
    ECommerce.Payment -> Customer "Sends confirmation"
}

view index {
include *
}

Visualizing Scenarios

Studio/Viewer can highlight scenario paths over your static architecture so readers can follow behavior step‑by‑step.


Data-Oriented Scenarios

Use scenario for data‑oriented flows too—keep steps focused on interactions and outcomes between elements.

Lesson 4


title: "Lesson 4: Architectural Perspectives" weight: 4 summary: "Understanding context, containers, and components without special DSL keywords."

Lesson 4: Architectural Perspectives

As your system grows, a single diagram becomes too cluttered. You need different "maps" for different audiences:

  • Executives: Need a high-level overview (Context).
  • Architects: Need to see service boundaries (Containers).
  • Developers: Need to see internal details (Components).

Sruja models naturally support multiple perspectives without special keywords. Use the built‑in elements, and tooling presents the right level of detail.

One Model, Multiple Perspectives

Sruja's views block lets you create custom perspectives from a single model. This is powerful for communicating with different audiences.

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Admin = person "Administrator"

Shop = system "E-Commerce Shop" {
WebApp = container "Web Application" {
  technology "React"
  CartComponent = component "Shopping Cart"
  ProductComponent = component "Product Catalog"
}
API = container "API Service" {
  technology "Rust"
  OrderController = component "Order Controller"
  PaymentController = component "Payment Controller"
}
DB = database "Database" {
  technology "PostgreSQL"
}
Cache = database "Cache" {
  technology "Redis"
}
}

PaymentGateway = system "Payment Gateway" {
  metadata {
    tags ["external"]
  }
}

// Relations
Customer -> Shop.WebApp "Browses products"
Admin -> Shop.WebApp "Manages inventory"
Shop.WebApp -> Shop.API "Fetches data"
Shop.API -> Shop.DB "Reads/Writes"
Shop.API -> Shop.Cache "Caches queries"
Shop.API -> PaymentGateway "Processes payments"

// Executive view: High-level context
view executive {
title "Executive Overview"
include Customer
include Admin
include Shop
include PaymentGateway
exclude Shop.WebApp
exclude Shop.API
exclude Shop.DB
exclude Shop.Cache
}

// Architect view: Container-level architecture
view architect {
title "Architectural View"
include Shop Shop.WebApp Shop.API Shop.DB Shop.Cache
include PaymentGateway
exclude Customer Admin
exclude Shop.CartComponent Shop.ProductComponent Shop.OrderController Shop.PaymentController
}

// Developer view: Component-level details
view developer {
title "Developer View"
include Shop.WebApp Shop.WebApp.CartComponent Shop.WebApp.ProductComponent
include Shop.API Shop.API.OrderController Shop.API.PaymentController
include Shop.DB Shop.Cache
exclude Customer Admin PaymentGateway
}

// Data flow view: Focus on data dependencies
view dataflow {
title "Data Flow View"
include Shop.API Shop.DB Shop.Cache
exclude Customer Admin Shop.WebApp PaymentGateway
}

// Default view: Everything
view index {
title "Complete System View"
include *
}

Key Benefits

  1. Context View (Executive): Shows systems and actors - perfect for stakeholders
  2. Container View (Architect): Shows deployable units and their relationships
  3. Component View (Developer): Shows internal structure and implementation details
  4. Data Flow View: Focuses on data dependencies and storage
  5. Complete View: Shows everything for comprehensive documentation

When to Use Views

  • Different Audiences: Tailor diagrams to what each audience needs to see
  • Reduce Complexity: Hide irrelevant details for specific discussions
  • Documentation: Create multiple diagrams from one source of truth
  • Presentations: Switch views during presentations to zoom in/out

Lesson 5


title: "Lesson 5: Views & Styling" weight: 5 summary: "Focus diagrams with views; improve legibility with styling."

Lesson 5: Views & Styling

Why Views?

Views let you spotlight specific paths (API, data, auth) without redrawing the whole system.

Sruja: Views and Styles

Views let you create focused diagrams from a single model. Styles make them visually clear.

Basic Views Example

import { * } from 'sruja.ai/stdlib'


User = person "User"

Shop = system "E-Commerce Shop" {
WebApp = container "Web Application"
API = container "API Service"
DB = database "Database"
}

User -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> Shop.DB "Reads/Writes"

// Default view: Everything
view index {
title "Complete System View"
include *
}

// API-focused view
view api {
title "API Focus View"
include Shop.API
include Shop.DB
exclude Shop.WebApp
exclude User
}

// User experience view
view user {
title "User Experience View"
include User
include Shop.WebApp
include Shop.API
exclude Shop.DB
}

Views with Custom Styling

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

ECommerce = system "E-Commerce System" {
WebApp = container "Web Application"
API = container "API Service"
OrderDB = database "Order Database"
ProductDB = database "Product Database"
}

Customer -> ECommerce.WebApp "Browses"
ECommerce.WebApp -> ECommerce.API "Fetches data"
ECommerce.API -> ECommerce.OrderDB "Stores orders"
ECommerce.API -> ECommerce.ProductDB "Queries products"

// Global styles
style {
element "Database" {
  shape cylinder
  color "#22c55e"
}
relation "Fetches data" {
  color "#3b82f6"
}
relation "Stores" {
  color "#ef4444"
}
}

view index {
title "Complete View"
include *
}

// Data flow view with custom styling
view dataflow {
title "Data Flow View"
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
exclude Customer
exclude ECommerce.WebApp

// View-specific styles override global styles
style {
  element "API" { color "#0ea5e9" }
  relation "Stores" { color "#10b981" }
}
}

Practice

  • Create an "Data Flow" view focusing on DB reads/writes.
  • Use view styles to highlight critical edges.

Lesson 6


title: "Lesson 6: Advanced DSL Features" weight: 6 summary: "Master views, scenarios, flows, and element kinds to create production-ready models."

Lesson 6: Advanced DSL Features

The Sruja DSL provides a flat syntax where all declarations—from element kinds to views—are top-level. This lesson covers the advanced capabilities that make your models more maintainable, understandable, and useful.

Kinds and Types: Your Foundation

Before creating instances of your architecture (like a "Database"), you must establish what kinds of elements exist. This isn't just documentation—it provides real benefits:

Benefits

  1. Early Validation: Catches typos in element types before runtime
  2. Better Tooling: Enables autocomplete, validation, and refactoring
  3. Documentation: Makes available element types explicit
  4. Organization: Separates structure definition from instantiation
import { * } from 'sruja.ai/stdlib'


// Now you can use any of the declared element types
Customer = person "Customer"
App = system "Application" {
API = container "API"
DB = datastore "Database"
}

Best Practice

Declare all element types you'll use upfront. This makes your model self-documenting and enables better tooling support.

Multiple Views: One Model, Many Perspectives

Use view blocks to create custom perspectives from your architecture. This is essential for communicating with different audiences. Unlike some other tools, Sruja allows defining views anywhere in your file, though keeping them at the bottom is a common convention.

Real-World Example: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Admin = person "Administrator"

ECommerce = system "E-Commerce Platform" {
WebApp = container "Web Application" {
  CartComponent = component "Shopping Cart"
  ProductComponent = component "Product Catalog"
}
API = container "API Service" {
  OrderController = component "Order Controller"
  PaymentController = component "Payment Controller"
}
OrderDB = datastore "Order Database"
ProductDB = datastore "Product Database"
Cache = datastore "Redis Cache"
EventQueue = queue "Event Queue"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

Customer -> ECommerce.WebApp "Browses"
ECommerce.WebApp -> ECommerce.API "Fetches data"
ECommerce.API -> ECommerce.OrderDB "Stores orders"
ECommerce.API -> ECommerce.Cache "Caches queries"
ECommerce.API -> PaymentGateway "Processes payments"

// Executive view: High-level business context
view executive {
title "Executive Overview"
include Customer
include Admin
include ECommerce
include PaymentGateway
exclude ECommerce.WebApp
exclude ECommerce.API
exclude ECommerce.OrderDB
exclude ECommerce.ProductDB
exclude ECommerce.Cache
exclude ECommerce.EventQueue
}

// Architect view: Container-level architecture
view architect {
title "Architectural View"
include ECommerce
include ECommerce.WebApp
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
include ECommerce.Cache
include ECommerce.EventQueue
include PaymentGateway
exclude Customer
exclude Admin
}

// Developer view: Component-level implementation
view developer {
title "Developer View"
include ECommerce.WebApp
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
include ECommerce.Cache
exclude Customer
exclude Admin
exclude PaymentGateway
}

// Data flow view: Focus on data dependencies
view dataflow {
title "Data Flow View"
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
include ECommerce.Cache
include ECommerce.EventQueue
exclude Customer
exclude Admin
exclude ECommerce.WebApp
exclude PaymentGateway
}

// User journey view: Customer experience
view userjourney {
title "User Journey View"
include Customer
include ECommerce.WebApp
include ECommerce.API
include PaymentGateway
exclude Admin
exclude ECommerce.OrderDB
exclude ECommerce.ProductDB
exclude ECommerce.Cache
exclude ECommerce.EventQueue
}

// Default view: Complete system
view index {
title "Complete System View"
include *
}

View Strategies

  1. By Audience: Executive, Architect, Developer, Product Manager
  2. By Concern: Data flow, Security, Performance, User experience
  3. By Layer: Context, Container, Component
  4. By Feature: Checkout flow, User management, Analytics

Scenarios: Modeling User Journeys

Scenarios model behavioral flows—what happens when users interact with your system. They're perfect for documenting user stories and use cases.

Example: Checkout Flow

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

ECommerce = system "E-Commerce System" {
WebApp = container "Web Application"
API = container "API Service"
OrderDB = datastore "Order Database"
}

Inventory = system "Inventory System" {
InventoryService = container "Inventory Service"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

// Model the checkout journey
// EXPECTED_FAILURE: Layer violation (scenarios model bidirectional user journeys)
CheckoutFlow = scenario "User Checkout Journey" {
Customer -> ECommerce.WebApp "Adds items to cart"
ECommerce.WebApp -> ECommerce.API "Submits checkout"
ECommerce.API -> Inventory.InventoryService "Reserves stock"
Inventory.InventoryService -> ECommerce.API "Confirms availability"
ECommerce.API -> PaymentGateway "Processes payment"
PaymentGateway -> ECommerce.API "Confirms payment"
ECommerce.API -> ECommerce.OrderDB "Saves order"
ECommerce.API -> ECommerce.WebApp "Returns confirmation"
ECommerce.WebApp -> Customer "Shows order confirmation"
}

// Alternative happy path
CheckoutSuccess = scenario "Successful Checkout" {
Customer -> ECommerce.WebApp "Completes checkout"
ECommerce.WebApp -> ECommerce.API "Processes order"
ECommerce.API -> Customer "Confirms order"
}

// Error scenario
CheckoutFailure = scenario "Checkout Failure" {
Customer -> ECommerce.WebApp "Attempts checkout"
ECommerce.WebApp -> ECommerce.API "Validates order"
ECommerce.API -> Inventory.InventoryService "Checks stock"
Inventory.InventoryService -> ECommerce.API "Out of stock"
ECommerce.API -> ECommerce.WebApp "Returns error"
ECommerce.WebApp -> Customer "Shows out of stock message"
}

view index {
include *
}

When to Use Scenarios

  • User Stories: Document how users interact with your system
  • Use Cases: Model specific business processes
  • Error Handling: Document failure paths and recovery
  • Integration Testing: Define test scenarios

Flows: Modeling Data Pipelines

Flows model data-oriented processes—how data moves through your system. Use them for ETL, streaming, and batch processing.

Example: Analytics Pipeline

// EXPECTED_FAILURE: Layer violation (flows model bidirectional data movement)
import { * } from 'sruja.ai/stdlib'


Analytics = system "Analytics Platform" {
IngestionService = container "Data Ingestion"
ProcessingService = container "Data Processing"
QueryService = container "Query Service"
EventStream = queue "Event Stream"
RawDataDB = datastore "Raw Data Store"
ProcessedDataDB = datastore "Processed Data Warehouse"
}

// Data flow: Event ingestion pipeline
EventIngestion = flow "Event Ingestion Pipeline" {
Analytics.IngestionService -> Analytics.EventStream "Publishes events"
Analytics.EventStream -> Analytics.ProcessingService "Streams events"
Analytics.ProcessingService -> Analytics.RawDataDB "Stores raw data"
Analytics.ProcessingService -> Analytics.ProcessedDataDB "Stores processed data"
Analytics.QueryService -> Analytics.ProcessedDataDB "Queries analytics"
}

// Batch processing flow
BatchProcessing = flow "Daily Batch Processing" {
Analytics.RawDataDB -> Analytics.ProcessingService "Extracts daily data"
Analytics.ProcessingService -> Analytics.ProcessingService "Transforms data"
Analytics.ProcessingService -> Analytics.ProcessedDataDB "Loads aggregated data"
}

view index {
include *
}

Scenario vs Flow

  • Scenario: Behavioral flows (user actions, business processes)
  • Flow: Data flows (ETL, streaming, batch processing)

Integrating Requirements, ADRs, and Policies

Sruja's flat syntax makes it easy to integrate requirements, ADRs, and policies directly into your architecture model as top-level declarations.

Complete Example

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

// Requirements drive architecture
R1 = requirement "Must handle 10k concurrent users" { tags ["functional"] }
R2 = requirement "API response < 200ms p95" { tags ["performance"] }
R3 = requirement "Scale to 1M users" { tags ["scalability"] }
R4 = requirement "All PII encrypted at rest" { tags ["security"] }

// Architecture decisions documented as ADRs
ADR001 = adr "Use microservices for independent scaling" {
status "Accepted"
context "Need to scale order processing independently from inventory"
decision "Split into OrderService and InventoryService"
consequences "Better scalability, increased network complexity"
}

ADR002 = adr "Use PostgreSQL for strong consistency" {
status "Accepted"
context "Need ACID transactions for financial data"
decision "Use PostgreSQL instead of NoSQL"
consequences "Strong consistency, SQL complexity"
}

// Architecture that satisfies requirements
ECommerce = system "E-Commerce Platform" {
API = container "API Service" {
  technology "Rust"
  description "Satisfies R1, R2, R3"
  // adr ADR001 ADR002

  slo {
    availability {
      target "99.99%"
      window "30 days"
    }
    latency {
      p95 "200ms"
      p99 "500ms"
    }
  }
}

OrderDB = datastore "Order Database" {
  technology "PostgreSQL"
  description "Satisfies R4 - encrypted at rest"
  // adr ADR002
}
}

// Policy enforcement
SecurityPolicy = policy "All databases must be encrypted" {
category "security"
enforcement "required"
description "Compliance requirement for PII data"
}

view index {
include *
}

Best Practices

  1. Explicit Kinds: Import or declare all element kinds upfront.
  2. Use Multiple Views: Create views for different audiences and concerns
  3. Document with Scenarios: Model user journeys and business processes
  4. Model Data Flows: Use flows for ETL and data pipelines
  5. Link Requirements: Connect requirements to architecture decisions
  6. Document Decisions: Use ADRs to explain why, not just what
  7. Define SLOs: Model service level objectives for production systems

Next Steps

Now that you understand the advanced features, you can create production-ready models that:

  • Communicate effectively with different audiences
  • Document user journeys and data flows
  • Link requirements to architecture decisions
  • Enable automated validation and governance

👉 Module 4: Production Readiness - Learn how to make your architecture production-ready.

Lesson 7


title: "Lesson 7: Views Best Practices" weight: 7 summary: "Master the art of creating effective views for different audiences and purposes."

Lesson 7: Views Best Practices

Views are one of the most powerful features in Sruja DSL. They let you create multiple perspectives from a single model, making your architecture documentation accessible to different audiences. This lesson covers best practices for creating effective views.

The Power of Views

A single architecture model can serve:

  • Executives: High-level business context
  • Product Managers: Feature and user journey focus
  • Architects: Technical design and patterns
  • Developers: Implementation details
  • Operations: Deployment and monitoring concerns
  • Security: Compliance and threat modeling

View Creation Strategy

1. Start with Your Audience

Before creating a view, ask:

  • Who will use this view? (Executive, Developer, Ops)
  • What questions do they need answered? (Cost, Performance, Security)
  • What level of detail do they need? (Context, Container, Component)

2. Use Include/Exclude Strategically

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Admin = person "Administrator"

ECommerce = system "E-Commerce Platform" {
WebApp = container "Web Application" {
  CartComponent = component "Shopping Cart"
  ProductComponent = component "Product Catalog"
}
API = container "API Service" {
  OrderController = component "Order Controller"
  PaymentController = component "Payment Controller"
}
OrderDB = database "Order Database"
ProductDB = database "Product Database"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

Customer -> ECommerce.WebApp "Browses"
ECommerce.WebApp -> ECommerce.API "Fetches data"
ECommerce.API -> ECommerce.OrderDB "Stores orders"
ECommerce.API -> PaymentGateway "Processes payments"

// Executive view: Business context only
view executive {
title "Executive Overview"
include Customer
include Admin
include ECommerce
include PaymentGateway
exclude ECommerce.WebApp
exclude ECommerce.API
exclude ECommerce.OrderDB
exclude ECommerce.ProductDB
}

// Architect view: Container-level architecture
view architect {
title "Architectural View"
include ECommerce
include ECommerce.WebApp
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
include PaymentGateway
exclude Customer
exclude Admin
}

// Developer view: Component-level implementation
view developer {
title "Developer View"
include ECommerce.WebApp
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
exclude Customer
exclude Admin
exclude PaymentGateway
}

3. Create Concern-Specific Views

Focus on specific concerns: performance, security, data flow, deployment.

// Performance view: Components with performance characteristics
view performance {
title "Performance View"
include ECommerce.API
include ECommerce.OrderDB
exclude Customer
exclude Admin
exclude ECommerce.WebApp
}

// Security view: External interactions and data stores
view security {
title "Security View"
include ECommerce.API
include PaymentGateway
include ECommerce.OrderDB
exclude Customer
exclude Admin
exclude ECommerce.WebApp
}

// Data flow view: Data dependencies
view dataflow {
title "Data Flow View"
include ECommerce.API
include ECommerce.OrderDB
include ECommerce.ProductDB
exclude Customer
exclude Admin
exclude ECommerce.WebApp
exclude PaymentGateway
}

View Naming Conventions

Use clear, descriptive names that indicate the view's purpose:

Good Names

  • executive - Clear audience
  • dataflow - Clear concern
  • deployment - Clear purpose
  • security-audit - Specific use case

Avoid

  • view1, view2 - Not descriptive
  • temp - Temporary views should be removed
  • test - Test views shouldn't be in production models

View Organization Patterns

Pattern 1: By Audience

Create views for each stakeholder group.

view executive { /* ... */ }
view product { /* ... */ }
view architect { /* ... */ }
view developer { /* ... */ }
view operations { /* ... */ }

Pattern 2: By Concern

Create views for different technical concerns.

view performance { /* ... */ }
view security { /* ... */ }
view dataflow { /* ... */ }
view deployment { /* ... */ }

Pattern 3: By Layer

Create views for different C4 model layers.

view context { /* System context */ }
view container { /* Container diagram */ }
view component { /* Component diagram */ }

Pattern 4: By Feature

Create views for specific features or domains.

view checkout { /* Checkout flow */ }
view search { /* Search functionality */ }
view analytics { /* Analytics pipeline */ }

Best Practices

1. Always Include an Index View

view index {
title "Complete System View"
include *
}

2. Use Descriptive Titles

view executive {
  title "Executive Overview - Business Context"
  // ...
}

3. Add Descriptions

view architect {
  title "Architectural View"
  description "Container-level architecture showing system boundaries and interactions"
  // ...
}

4. Keep Views Focused

Each view should answer a specific set of questions. If a view tries to answer too many questions, split it into multiple views.

5. Document View Purpose

Use comments or descriptions to explain why a view exists and when to use it.

// Use this view for:
// - Executive presentations
// - Business stakeholder discussions
// - High-level architecture reviews
view executive {
title "Executive Overview"
// ...
}

Common View Patterns

Executive Dashboard View

view executive {
  title "Executive Dashboard"
  include Customer Admin
  include ECommerce PaymentGateway
  exclude ECommerce.WebApp ECommerce.API
  exclude ECommerce.OrderDB ECommerce.ProductDB
}

Technical Architecture View

view technical {
  title "Technical Architecture"
  include ECommerce ECommerce.WebApp ECommerce.API
  include ECommerce.OrderDB ECommerce.ProductDB
  exclude Customer Admin
}

User Journey View

view userjourney {
  title "User Journey View"
  include Customer
  include ECommerce.WebApp ECommerce.API
  include PaymentGateway
  exclude Admin ECommerce.OrderDB ECommerce.ProductDB
}

Deployment View

view deployment {
  title "Deployment View"
  include ECommerce.WebApp ECommerce.API
  include ECommerce.OrderDB
  exclude Customer Admin PaymentGateway
}

View Maintenance

When to Create New Views

  • New stakeholder group needs different perspective
  • New concern emerges (e.g., compliance)
  • Feature-specific view needed for documentation

When to Remove Views

  • View is no longer used
  • View duplicates another view
  • View is outdated and not maintained

When to Update Views

  • Architecture changes affect view scope
  • New components need to be included/excluded
  • View purpose changes

Advanced: View Composition

You can create views that build on other views by carefully selecting elements:

// Base view: All containers
view containers {
title "Container View"
include ECommerce.WebApp ECommerce.API
include ECommerce.OrderDB ECommerce.ProductDB
}

// Extended view: Containers + external systems
view containers-extended {
title "Container View with External Systems"
include ECommerce.WebApp ECommerce.API
include ECommerce.OrderDB ECommerce.ProductDB
include PaymentGateway
}

Real-World Example: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Admin = person "Administrator"

ECommerce = system "E-Commerce Platform" {
WebApp = container "Web Application" {
  CartComponent = component "Shopping Cart"
  ProductComponent = component "Product Catalog"
}
API = container "API Service" {
  OrderController = component "Order Controller"
  PaymentController = component "Payment Controller"
}
OrderDB = database "Order Database"
ProductDB = database "Product Database"
Cache = database "Redis Cache"
EventQueue = queue "Event Queue"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

Customer -> ECommerce.WebApp "Browses"
ECommerce.WebApp -> ECommerce.API "Fetches data"
ECommerce.API -> ECommerce.OrderDB "Stores orders"
ECommerce.API -> ECommerce.Cache "Caches queries"
ECommerce.API -> PaymentGateway "Processes payments"

// Complete system
view index {
title "Complete System View"
include *
}

// Executive: Business context
view executive {
title "Executive Overview"
include Customer Admin
include ECommerce PaymentGateway
exclude ECommerce.WebApp ECommerce.API ECommerce.OrderDB ECommerce.ProductDB ECommerce.Cache ECommerce.EventQueue
}

// Product: User journeys
view product {
title "Product View - User Journeys"
include Customer
include ECommerce.WebApp
include ECommerce.API
include PaymentGateway
exclude Admin
exclude ECommerce.OrderDB
exclude ECommerce.ProductDB
exclude ECommerce.Cache
exclude ECommerce.EventQueue
}

// Architect: Container architecture
view architect {
title "Architectural View"
include ECommerce ECommerce.WebApp ECommerce.API
include ECommerce.OrderDB ECommerce.ProductDB ECommerce.Cache ECommerce.EventQueue
include PaymentGateway
exclude Customer Admin
}

// Developer: Component details
view developer {
title "Developer View"
include ECommerce.WebApp
include ECommerce.API
include ECommerce.OrderDB ECommerce.ProductDB ECommerce.Cache
exclude Customer Admin PaymentGateway
}

// Operations: Deployment and monitoring
view operations {
title "Operations View"
include ECommerce.WebApp ECommerce.API
include ECommerce.OrderDB ECommerce.ProductDB ECommerce.Cache ECommerce.EventQueue
exclude Customer Admin PaymentGateway
}

// Data flow: Data dependencies
view dataflow {
title "Data Flow View"
include ECommerce.API ECommerce.OrderDB ECommerce.ProductDB ECommerce.Cache ECommerce.EventQueue
exclude Customer Admin ECommerce.WebApp PaymentGateway
}

// Performance: Performance-critical components
view performance {
title "Performance View"
include ECommerce.API ECommerce.Cache ECommerce.OrderDB
exclude Customer Admin ECommerce.WebApp ECommerce.ProductDB ECommerce.EventQueue PaymentGateway
}

Summary

Views are a powerful tool for making architecture documentation accessible to different audiences. Follow these best practices:

  1. Start with your audience - Know who will use the view
  2. Use include/exclude strategically - Focus on what matters
  3. Create concern-specific views - Performance, security, data flow
  4. Use clear naming - Descriptive names that indicate purpose
  5. Document view purpose - Explain why the view exists
  6. Keep views focused - Each view should answer specific questions
  7. Maintain views - Update or remove views as architecture evolves

👉 Module 4: Production Readiness - Learn how to make your architecture production-ready.

Production Readiness

Lesson 1


title: "Lesson 1: Documenting Decisions (ADRs)" weight: 1 summary: "Why document? What is an ADR?"

Lesson 1: Documenting Decisions (ADRs)

What is an ADR?

An Architecture Decision Record (ADR) is a document that captures an important architectural decision made along with its context and consequences.

Why use ADRs?

  • Context: Explains why a decision was made (e.g., "Why did we choose Postgres over Mongo?").
  • Onboarding: Helps new team members understand the history of the system.
  • Alignment: Ensures everyone agrees on the path forward.

Structure of an ADR

  1. Title: Short summary.
  2. Status: Proposed, Accepted, Deprecated.
  3. Context: The problem we are solving.
  4. Decision: What we are doing.
  5. Consequences: The pros and cons of this decision.

🛠️ Sruja Perspective: Native ADR Support

Sruja treats ADRs as first-class citizens. You can define them directly in your architecture file.

import { * } from 'sruja.ai/stdlib'


// Define an ADR
adr ADR001 "Use Stripe for Payments" {
    status "Accepted"
    context "We need a reliable payment processor that supports global currencies."

    // Document alternatives considered
    option "PayPal" {
        pros "Easy setup"
        cons "Higher fees"
    }

    option "Stripe" {
        pros "Developer friendly, good API"
        cons "Complex verification"
    }

    decision "Stripe"
    reason "Better developer experience and lower fees at scale."
    consequences "Vendor lock-in, but faster time to market."
}

PaymentService = system "Payment Service" {
    // Link the ADR to the component it affects
    adr ADR001
    description "Handles credit card processing."
}

view index {
include *
}

This ensures that your documentation lives right next to the code it describes, making it harder to ignore or lose.

Lesson 2


title: "Lesson 2: Deployment Architecture" weight: 2 summary: "Cloud, On-prem, Containers, and real-world deployment strategies."

Lesson 2: Deployment Architecture

Real-World Scenario: Startup to Scale

Context: A fintech startup begins with a single server, grows to 10M users, and needs to plan deployment architecture.

Challenge: How do you model and evolve deployment architecture as you scale?

Logical vs. Physical Architecture

  • Logical Architecture: The software components and how they interact (Containers, Components).
  • Physical Architecture: Where those components actually run (Servers, VMs, Kubernetes Pods).

Why it matters: Understanding this separation helps you:

  • Plan migrations (e.g., moving from EC2 to EKS)
  • Model multi-cloud strategies
  • Document disaster recovery plans
  • Communicate with DevOps teams

Deployment Strategies: Real-World Trade-offs

On-Premises

Running on your own hardware in a data center.

Real-world use cases:

  • Financial institutions (regulatory compliance)
  • Healthcare systems (HIPAA data residency)
  • Government systems (sovereignty requirements)
  • Pros: Total control, security, data sovereignty.
  • Cons: High maintenance, capital expense, slower scaling.
  • Cost example: $500K+ initial investment, 3-5 year hardware refresh cycles

Cloud (AWS, GCP, Azure)

Renting infrastructure from a provider.

Real-world use cases:

  • SaaS platforms (99% of modern startups)
  • E-commerce (seasonal scaling)
  • Media streaming (global distribution)
  • Pros: Pay-as-you-go, infinite scale, managed services.
  • Cons: Vendor lock-in, variable costs, compliance complexity.
  • Cost example: $5K/month for small SaaS, $50K+ for mid-size, $500K+ for enterprise

Containers & Orchestration

Packaging code with dependencies (Docker) and managing them at scale (Kubernetes).

Real-world adoption:

  • 2024: 70% of enterprises use Kubernetes
  • Common pattern: Docker → Kubernetes → Service Mesh (Istio/Linkerd)

Production considerations:

  • Resource limits (CPU/memory)
  • Health checks and liveness probes
  • Rolling updates and rollback strategies
  • Multi-region deployments

🛠️ Sruja Perspective: Deployment Nodes

Sruja allows you to map your logical containers to physical deployment nodes, making it clear where code runs in production.

Example: Multi-Region E-Commerce Platform

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
    API = container "REST API" {
        technology "Rust"
        scale {
            min 3
            max 100
            metric "cpu > 70%"
        }
    }
    WebApp = container "React Frontend" {
        technology "React"
    }
    PrimaryDB = database "PostgreSQL" {
        technology "PostgreSQL"
    }
    Cache = database "Redis" {
        technology "Redis"
    }
}

// Production deployment across regions
deployment Production "Production Environment" {
    node AWS "AWS Cloud" {
        // Primary region: US-East-1
        node USEast1 "US-East-1 (Primary)" {
            node EKS "EKS Cluster" {
                containerInstance API {
                    replicas 10
                }
                containerInstance WebApp {
                    replicas 5
                }
            }
            node RDS "RDS Multi-AZ" {
                containerInstance PrimaryDB {
                    role "primary"
                }
            }
            node ElastiCache "ElastiCache" {
                containerInstance Cache
            }
        }

        // Secondary region: EU-West-1 (for DR and latency)
        node EUWest1 "EU-West-1 (Secondary)" {
            node EKS "EKS Cluster" {
                containerInstance API {
                    replicas 5
                }
                containerInstance WebApp {
                    replicas 3
                }
            }
            node RDS "RDS Read Replica" {
                containerInstance PrimaryDB {
                    role "read-replica"
                }
            }
        }
    }
}

view index {
include *
}

DevOps Integration: CI/CD Pipeline

Model your deployment pipeline alongside architecture:

import { * } from 'sruja.ai/stdlib'


CICD = system "CI/CD System" {
    GitHubActions = container "GitHub Actions" {
        description "Triggers on push to main branch"
    }
    BuildService = container "Build Service" {
        technology "Docker"
        description "Builds container images"
    }
    TestRunner = container "Test Runner" {
        description "Runs unit, integration, and E2E tests"
    }
    DeployService = container "Deploy Service" {
        technology "ArgoCD"
        description "Deploys to Kubernetes clusters"
    }

    GitHubActions -> BuildService "Builds image"
    BuildService -> TestRunner "Runs tests"
    TestRunner -> DeployService "Deploys if tests pass"
}

// Link deployment to CI/CD
// Note: Deployment metadata (CI/CD pipeline, strategy) is modeled in the deployment node
deployment Production "Production Deployment" {
    node Infrastructure "Production Infrastructure" {
    }
}

view index {
include *
}

Real-World Patterns

Pattern 1: Blue/Green Deployment

Use case: Zero-downtime deployments for critical services

deployment Production {
    node Blue "Active Environment" {
        containerInstance API {
            traffic 100
        }
    }
    node Green "Staging Environment" {
        containerInstance API {
            traffic 0
            status "ready-for-switch"
        }
    }
}

DevOps workflow:

  1. Deploy new version to Green
  2. Run smoke tests
  3. Switch 10% traffic to Green
  4. Monitor for 15 minutes
  5. Gradually increase to 100%
  6. Keep Blue ready for rollback

Pattern 2: Canary Deployment

Use case: Gradual rollout with automatic rollback

deployment Production {
    node Canary "Canary Cluster" {
        containerInstance API {
            traffic 5
            description "5% of traffic, auto-rollback on error rate > 1%"
        }
    }
    node Stable "Stable Cluster" {
        containerInstance API {
            traffic 95
        }
    }
}

Pattern 3: Multi-Cloud Strategy

Use case: Avoiding vendor lock-in, disaster recovery

deployment Production {
    node AWS "AWS Primary" {
        containerInstance API {
            region "us-east-1"
            traffic 80
        }
    }
    node GCP "GCP Secondary" {
        containerInstance API {
            region "us-central1"
            traffic 20
            description "Failover target"
        }
    }
}

Service Level Objectives (SLOs)

Define reliability targets directly in your architecture model:

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
API = container "REST API" {
  technology "Rust"

  // Define SLOs for production monitoring
  slo {
    availability {
      target "99.99%"
      window "30 days"
      current "99.95%"
    }
    latency {
      p95 "200ms"
      p99 "500ms"
      window "7 days"
      current {
        p95 "180ms"
        p99 "450ms"
      }
    }
    errorRate {
      target "< 0.1%"
      window "30 days"
      current "0.05%"
    }
    throughput {
      target "1000 req/s"
      window "1 hour"
      current "950 req/s"
    }
  }
}

Database = database "PostgreSQL" {
  technology "PostgreSQL"
  slo {
    availability {
      target "99.9%"
      window "30 days"
    }
    latency {
      p95 "50ms"
      p99 "100ms"
    }
  }
}
}

view index {
include *
}

Benefits of Modeling SLOs

  1. Clear Targets: Everyone knows what "good" looks like
  2. Monitoring Guidance: SLOs define what metrics to track
  3. Stakeholder Communication: Clear reliability commitments
  4. Living Documentation: SLOs live with architecture, not separate docs

Monitoring & Observability

Model your observability stack:

import { * } from 'sruja.ai/stdlib'


Observability = system "Observability Stack" {
Prometheus = container "Metrics" {
  technology "Prometheus"
  description "Collects metrics from all services"
}
Grafana = container "Dashboards" {
  technology "Grafana"
  description "Visualizes metrics and alerts"
}
ELK = container "Logging" {
  technology "Elasticsearch, Logstash, Kibana"
  description "Centralized logging"
}
Jaeger = container "Tracing" {
  technology "Jaeger"
  description "Distributed tracing"
}
}

ECommerce = system "E-Commerce" {
  API = container "API"
}

// Link to your services
ECommerce.API -> Observability.Prometheus "Exposes metrics"
ECommerce.API -> Observability.ELK "Sends logs"
ECommerce.API -> Observability.Jaeger "Sends traces"

Key Takeaways

  1. Separate logical from physical: Model what your system does (logical) separately from where it runs (physical)
  2. Document deployment strategies: Use deployment nodes to show Blue/Green, Canary, or multi-region setups
  3. Link to CI/CD: Show how code flows from commit to production
  4. Model observability: Include monitoring, logging, and tracing in your architecture
  5. Plan for scale: Document scaling strategies (min/max replicas, regions)

Exercise: Model Your Deployment

  1. Choose a real system you work on (or a hypothetical one)
  2. Model the logical architecture (containers, datastores)
  3. Map to physical deployment (cloud regions, clusters)
  4. Add deployment strategy (Blue/Green, Canary, or Rolling)
  5. Include observability components

Time: 20 minutes

Further Reading

Lesson 3


title: "Lesson 3: Governance as Code" weight: 3 summary: "Automating architectural compliance with Policies and Rules."

Lesson 3: Governance as Code

As your organization scales, manually reviewing every architectural change becomes impossible. You need automated guardrails to ensure consistency and security.

What is Governance as Code?

Governance as Code treats architectural policies (e.g., "All databases must be encrypted", "No circular dependencies") as executable code that can be validated automatically in your CI/CD pipeline.

Built-in Validation Rules

Sruja validates common architectural concerns automatically:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Service" {
    API = container "Payment API" {
        technology "Rust"
        tags ["encrypted", "pci-compliant"]

        // SLOs for payment processing
        slo {
            availability {
                target "99.99%"
                window "30 days"
                current "99.98%"
                description "Payment processing must be highly available"
            }
            latency {
                p95 "100ms"
                p99 "200ms"
                window "7 days"
                current {
                    p95 "95ms"
                    p99 "190ms"
                }
                description "Fast payment processing critical for UX"
            }
            errorRate {
                target "< 0.01%"
                window "30 days"
                current "0.008%"
                description "Payment errors must be extremely rare"
            }
            throughput {
                target "500 req/s"
                window "1 hour"
                current "480 req/s"
                description "Handle peak payment volumes"
            }
        }
    }

    DB = database "Payment Database" {
        technology "PostgreSQL"
        tags ["encrypted", "backed-up"]

        // Database SLOs
        slo {
            availability {
                target "99.95%"
                window "30 days"
                current "99.92%"
            }
            latency {
                p95 "20ms"
                p99 "50ms"
                window "7 days"
            }
        }
    }
}

Auditor = person "Security Auditor"
Auditor -> PaymentService.API "Reviews"
PaymentService.API -> PaymentService.DB "Reads/Writes"

view index {
title "Payment Service with Governance"
include *
}

// SLO monitoring view
view slos {
title "SLO Monitoring View"
include PaymentService.API PaymentService.DB
exclude Auditor
description "Focuses on components with SLOs defined"
}

// Compliance view
view compliance {
title "Compliance View"
include PaymentService.API PaymentService.DB
exclude Auditor
description "Shows components with compliance tags"
}

Automated Validation

The real power comes when you run the Sruja CLI. It can check your architecture against these policies and fail the build if violations are found.

sruja validate architecture.sruja

This ensures that your architecture isn't just a diagram—it's a specification that is continuously verified.

Lesson 4


title: "Lesson 4: SLOs & Scale Integration" weight: 4 summary: "Define SLOs and align scale to meet targets."

Lesson 4: SLOs & Scale Integration

Why SLOs?

SLOs set measurable targets (availability, latency, error rate, throughput). They guide capacity and design.

Sruja: SLO + Scale

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
API = container "API Service" {
  technology "Rust"

  // Scale configuration aligned with SLOs
  scale {
    metric "req/s"
    min 200
    max 2000
  }

  // SLOs define what "good" looks like
  slo {
    availability {
      target "99.9%"
      window "30 days"
      current "99.95%"
    }
    latency {
      p95 "200ms"
      p99 "500ms"
      window "7 days"
      current {
        p95 "180ms"
        p99 "450ms"
      }
    }
    errorRate {
      target "< 0.1%"
      window "30 days"
      current "0.05%"
    }
    throughput {
      target "1000 req/s"
      window "1 hour"
      current "950 req/s"
    }
  }
}

Database = database "PostgreSQL" {
  technology "PostgreSQL"
  slo {
    availability {
      target "99.9%"
      window "30 days"
    }
    latency {
      p95 "50ms"
      p99 "100ms"
    }
  }
}

API -> Database "Reads/Writes"
}

view index {
title "Production System with SLOs"
include *
}

// SLO monitoring view
view slos {
title "SLO Monitoring View"
include ECommerce.API ECommerce.Database
}

Key Integration Points

  1. Scale aligns with SLOs: Min/max replicas support throughput targets
  2. SLOs guide monitoring: Define what metrics to track and alert on
  3. Current vs Target: Track progress toward SLO targets
  4. Multiple SLO types: Availability, latency, error rate, throughput

Practice

  • Set p95 and availability targets for the API.
  • Adjust scale bounds to keep throughput above target.

Lesson 5


title: "Lesson 5: Tracking Architecture Evolution" weight: 5 summary: "Track architecture evolution using Git, ADRs, and SLOs."

Lesson 5: Tracking Architecture Evolution

Why Track Changes?

Keeping a history of changes improves communication, auditability, and onboarding. Sruja integrates with Git to provide automatic change tracking, while ADRs document decisions and SLOs track evolution over time.

Git: Automatic Change Tracking

Git automatically tracks all changes to your architecture files. No special syntax needed!

Viewing Change History

# View all changes to architecture file
git log --oneline --follow architecture.sruja

# See what changed in a specific commit
git show <commit> -- architecture.sruja

# Compare two versions
git diff v1.0..v2.0 -- architecture.sruja

# View changes by author
git log --author="alice" -- architecture.sruja

Version Tagging

# Tag major versions
git tag -a v2025.01 -m "Post-Black Friday stabilization"
git tag -a v2025.02 -m "After caching improvements"

# View architecture at specific version
git show v2025.01:architecture.sruja | sruja export

# Compare versions
git diff v2025.01..v2025.02 -- architecture.sruja

Advantages:

  • Automatic - Every change is tracked
  • Powerful queries - Git log, blame, diff
  • Attribution - Know who made changes
  • Context - PR reviews, discussions
  • Standard - Every developer knows Git

SLOs: Track Evolution Over Time

SLOs naturally track evolution through target vs current values:

import { * } from 'sruja.ai/stdlib'

Shop = system "E-Commerce Shop" {
  API = container "API Service" {
    technology "Rust"

    // SLOs show evolution: target vs current
    slo {
      availability {
        target "99.9%"
        window "30 days"
        current "99.85%"
      }
      latency {
        p95 "200ms"
        p99 "500ms"
        window "7 days"
        current {
          p95 "250ms"  // Improved from 300ms after adding Redis
          p99 "600ms"  // Improved from 700ms
        }
      }
      errorRate {
        target "< 0.1%"
        window "30 days"
        current "0.12%"
      }
    }
  }

  Cache = database "Redis Cache" {
    technology "Redis"
    description "Added to improve latency SLO (see ADR-005)"
  }

  Database = database "PostgreSQL" {
    technology "PostgreSQL"
  }

  API -> Cache "Reads"
  API -> Database "Reads/Writes"
}

view index {
  title "Shop System with SLO Tracking"
  include *
}

Advantages:

  • Quantitative tracking - Actual metrics over time
  • In context - SLOs live with the components they measure
  • Self-documenting - Current vs target shows progress
  • Link to ADRs - Reference decisions that affected SLOs

ADRs: Document Decisions and Rationale

ADRs link architectural changes to their rationale:

ADR005 = adr "Add Redis cache for latency SLO" {
  status "accepted"
  context "Latency SLO not met - p95 was 300ms, target is 200ms. Database queries are bottleneck."
  decision "Introduce Redis cache for hot paths (product catalog, user sessions)"
  consequences "Latency improved to 250ms p95 (still above target but progress). Reduced database load by 40%. Trade-off: Added operational complexity for cache invalidation."
}

ADR006 = adr "Optimize database queries" {
  status "accepted"
  context "Latency still above target (250ms vs 200ms). Cache helped but not enough."
  decision "Add database indexes, optimize N+1 queries, implement query result caching"
  consequences "Latency improved to 200ms p95 (target met!). Database CPU usage reduced. Trade-off: More complex queries, slower schema migrations."
}

Advantages:

  • Rich context - Why decisions were made
  • Status tracking - accepted, rejected, superseded
  • Link to SLOs - Connect decisions to measurable outcomes
  • Historical record - Understand evolution of architecture

Complete Example: Tracking Evolution

import { * } from 'sruja.ai/stdlib'

Shop = system "E-Commerce Shop" {
  API = container "API Service" {
    technology "Rust"
    slo {
      latency {
        p95 "200ms"
        window "7 days"
        current {
          p95 "200ms"  // Improved from 300ms (see ADR-005, ADR-006)
        }
      }
    }
  }

  Cache = database "Redis Cache" {
    technology "Redis"
    description "Added per ADR-005 to improve latency"
  }

  Database = database "PostgreSQL" {
    technology "PostgreSQL"
    description "Optimized per ADR-006"
  }

  API -> Cache "Reads"
  API -> Database "Reads/Writes"
}

// Link decisions to changes
ADR005 = adr "Add Redis cache for latency SLO" {
  status "accepted"
  context "Latency SLO not met - p95 was 300ms, target is 200ms"
  decision "Introduce Redis cache for hot paths"
  consequences "Latency improved to 250ms p95, reduced DB load by 40%"
}

ADR006 = adr "Optimize database queries" {
  status "accepted"
  context "Latency still above target (250ms vs 200ms)"
  decision "Add indexes, optimize N+1 queries"
  consequences "Latency improved to 200ms p95 - target met!"
}

view index {
  title "Shop System with Evolution Tracking"
  include *
}

Best Practices

1. Use Git for Change Tracking

  • ✅ Commit architecture changes with descriptive messages
  • ✅ Use semantic versioning tags for major releases
  • ✅ Link PR descriptions to ADRs when making architectural changes
  • ✅ Use git log and git diff to view evolution

2. Document Decisions with ADRs

  • ✅ Create ADR for each significant architectural decision
  • ✅ Link ADRs to SLOs in descriptions ("see ADR-005")
  • ✅ Include context, decision, and consequences
  • ✅ Update ADR status when decisions are superseded

3. Track Evolution with SLOs

  • ✅ Include both target and current values
  • ✅ Add descriptions linking to ADRs
  • ✅ Update current values as metrics improve
  • ✅ Document improvements in descriptions

4. Connect the Pieces

  • Git commits → Track who changed what, when
  • ADRs → Document why changes were made
  • SLOs → Measure the impact of changes

Practice

  1. View your architecture history:

    git log --oneline architecture.sruja
    
  2. Create an ADR documenting a recent architectural decision

  3. Update SLOs with current values and link to ADRs

  4. Tag a version:

    git tag -a v1.0 -m "Initial architecture baseline"
    
  • adr for documenting decisions
  • slo for tracking service level objectives
  • Git for automatic change tracking
  • deployment for tracking runtime topology changes

System Design 201


title: "System Design 201: Advanced Systems" summary: "Deep dive into high throughput, real-time, data-intensive patterns, and consistency trade-offs." weight: 1

System Design 201: Advanced Systems

Overview

  • Focuses on scaling strategies and production realities beyond fundamentals
  • Covers throughput, real-time processing, data-intensive architectures, and consistency models

Learning Goals

  • Design services for high throughput and predictable performance
  • Apply real-time processing patterns for streaming data
  • Architect data-intensive systems with storage and compute separation
  • Choose appropriate consistency and isolation models (and know the trade-offs)

Prerequisites

  • Completed or familiar with concepts from System Design 101
  • Comfortable with distributed systems basics, caching, queues, and storage types

Course Structure

  • Module 1: High Throughput
  • Module 2: Real-Time
  • Module 3: Data-Intensive
  • Module 4: Consistency

Where to Start

  • Begin with Module 1 to build scaling foundations, then proceed in order

High Throughput


title: "Module Overview: High Throughput Systems" weight: 0 summary: "Master high-throughput interview questions: queues, sharding, and scaling patterns."

Module Overview: High Throughput Systems

"Design a system that handles 1 million requests per second."

This module covers advanced scaling patterns needed for high-throughput systems - a common interview topic at top tech companies.

Learning Goals

  • Identify throughput bottlenecks in systems
  • Apply scaling patterns (queues, sharding, caching)
  • Model trade-offs and document decisions with ADRs
  • Design systems that handle massive scale

Interview Preparation

  • ✅ Answer "design for high throughput" questions
  • ✅ Explain queuing and async processing
  • ✅ Discuss database sharding strategies
  • ✅ Model scaling patterns with Sruja

Real-World Application

  • Design systems that handle millions of requests
  • Apply patterns to actual high-scale systems
  • Understand trade-offs in scaling decisions

Estimated Time

60-75 minutes (includes practice)

Checklist

  • Can identify throughput bottlenecks
  • Understand queuing and async patterns
  • Can design sharding strategies
  • Can explain trade-offs clearly

Lesson 1


title: "Lesson 1: Design a URL Shortener" weight: 1 summary: "TinyURL: Hashing, Key-Value Stores, Redirection."

Lesson 1: Design a URL Shortener

Goal: Design a service like TinyURL that takes a long URL and converts it into a short alias (e.g., http://tiny.url/xyz).

Requirements

Functional

  • shorten(long_url) -> short_url
  • redirect(short_url) -> long_url
  • Custom aliases (optional).

Non-Functional

  • Highly Available: If the service is down, URL redirection stops working.
  • Low Latency: Redirection must happen in milliseconds.
  • Read-Heavy: 100:1 read-to-write ratio.

Core Design

1. Database Choice

Since we need fast lookups and the data model is simple (Key-Value), a NoSQL Key-Value Store (like DynamoDB or Redis) is ideal.

  • Key: short_alias
  • Value: long_url

2. Hashing Algorithm

How do we generate the alias?

  • MD5/SHA256: Too long.
  • Base62 Encoding: Converts a unique ID (from a counter or database ID) into a string of characters [a-z, A-Z, 0-9].

🛠️ Sruja Perspective: Modeling the Flow

We can use Sruja to model the system components and the user scenario for redirection.

import { * } from 'sruja.ai/stdlib'


R1 = requirement functional "Shorten long URL"
R2 = requirement functional "Redirect short URL"
R3 = requirement availability "High availability for redirects"
R4 = requirement performance "Low latency (< 200ms)"

// Define the system boundary
TinyURL = system "TinyURL Service" {
  WebServer = container "API Server" {
    technology "Rust"
    scale {
      min 3
      max 20
      metric "cpu > 70%"
    }
  }

  DB = database "UrlStore" {
    technology "DynamoDB"
    description "Stores mapping: short_alias -> long_url"
  }

  Cache = container "Cache" {
    technology "Redis"
    description "Caches popular redirects"
  }

  WebServer -> Cache "Reads"
  WebServer -> DB "Reads/Writes"
}

User = person "User"

// Define the redirection scenario (most common - cache hit)
RedirectFlowCacheHit = scenario "User clicks a short link (cache hit)" {
  User -> TinyURL.WebServer "GET /xyz"
  TinyURL.WebServer -> TinyURL.Cache "Check cache for 'xyz'"
  TinyURL.Cache -> TinyURL.WebServer "Hit: 'http://example.com'"
  TinyURL.WebServer -> User "301 Redirect (from cache)"
}

// Cache miss scenario
RedirectFlowCacheMiss = scenario "User clicks a short link (cache miss)" {
  User -> TinyURL.WebServer "GET /xyz"
  TinyURL.WebServer -> TinyURL.Cache "Check cache for 'xyz'"
  TinyURL.Cache -> TinyURL.WebServer "Miss"
  TinyURL.WebServer -> TinyURL.DB "Get long_url for 'xyz'"
  TinyURL.DB -> TinyURL.WebServer "Return 'http://example.com'"
  TinyURL.WebServer -> TinyURL.Cache "Cache the mapping"
  TinyURL.WebServer -> User "301 Redirect to 'http://example.com'"
}

// URL shortening scenario
ShortenURL = scenario "User creates a short URL" {
  User -> TinyURL.WebServer "POST /shorten with long_url"
  TinyURL.WebServer -> TinyURL.WebServer "Generate base62 alias"
  TinyURL.WebServer -> TinyURL.DB "Store mapping: alias -> long_url"
  TinyURL.DB -> TinyURL.WebServer "Confirm stored"
  TinyURL.WebServer -> User "Return short URL"
}

view index {
include *
}

Lesson 2


title: "Lesson 2: Design a Rate Limiter" weight: 2 summary: "Token Bucket, Distributed Caching, Middleware."

Lesson 2: Design a Rate Limiter

Goal: Design a system to limit the number of requests a client can send to an API within a time window (e.g., 10 requests per second).

Why Rate Limit?

  • Prevent Abuse: Stop DDoS attacks or malicious bots.
  • Fairness: Ensure one user doesn't hog all resources.
  • Cost Control: Prevent auto-scaling bills from exploding.

Algorithms

Token Bucket

  • A "bucket" holds tokens.
  • Tokens are added at a fixed rate (e.g., 10 tokens/sec).
  • Each request consumes a token.
  • If the bucket is empty, the request is dropped (429 Too Many Requests).

Leaky Bucket

  • Requests enter a queue (bucket) and are processed at a constant rate.
  • If the queue is full, new requests are dropped.

Architecture Location

Where does the rate limiter live?

  1. Client-side: Unreliable (can be forged).
  2. Server-side: Inside the application code.
  3. Middleware: In a centralized API Gateway (Best practice).

🛠️ Sruja Perspective: Middleware Modeling

In Sruja, we can model the Rate Limiter as a component within the API Gateway, backed by a fast datastore like Redis.

import { * } from 'sruja.ai/stdlib'


APIGateway = system "API Gateway" {
    GatewayService = container "Gateway" {
        technology "Nginx / Kong"

        RateLimiter = component "Rate Limiter Middleware" {
            description "Implements Token Bucket algorithm"
        }
    }

    Redis = database "Rate Limit Store" {
        technology "Redis"
        description "Stores token counts per user/IP"
    }

    APIGateway.GatewayService -> APIGateway.Redis "Stores tokens"
}

Backend = system "Backend Service"

APIGateway.GatewayService -> Backend "Forward Requests"
Client = person "Client"

// Scenario: Request allowed (has tokens)
RateLimitAllowed = scenario "Rate Limit Check - Allowed" {
    Client -> APIGateway.GatewayService "API Request"
    APIGateway.GatewayService -> APIGateway.Redis "DECR user_123_tokens"
    APIGateway.Redis -> APIGateway.GatewayService "Result: 5 (tokens remaining)"
    APIGateway.GatewayService -> Backend "Forward request"
    Backend -> APIGateway.GatewayService "Response"
    APIGateway.GatewayService -> Client "200 OK"
}

// Scenario: Request rate limited (no tokens)
RateLimitBlocked = scenario "Rate Limit Check - Blocked" {
    Client -> APIGateway.GatewayService "API Request"
    APIGateway.GatewayService -> APIGateway.Redis "DECR user_123_tokens"
    APIGateway.Redis -> APIGateway.GatewayService "Result: -1 (Empty bucket)"
    APIGateway.GatewayService -> Client "429 Too Many Requests"
}

// Scenario: Token refill (background process)
TokenRefill = scenario "Token Bucket Refill" {
    APIGateway.Redis -> APIGateway.Redis "Add 10 tokens/sec (background)"
    APIGateway.Redis -> APIGateway.Redis "Cap at max bucket size"
}

view index {
include *
}

Lesson 3


title: "Lesson 3: Views for Critical Throughput Paths" weight: 3 summary: "Use views to isolate and analyze high‑volume flows."

Lesson 3: Views for Critical Throughput Paths

Why Views for Throughput?

Focus on hot paths to reason about scaling, backpressure, and caching. High-throughput systems have critical paths that need isolation for analysis.

Sruja: High‑Throughput View

import { * } from 'sruja.ai/stdlib'


Pipeline = system "Data Pipeline" {
Ingest = container "Ingestion Service" {
  technology "Kafka Consumer"
  scale {
    min 5
    max 50
    metric "lag > 1000"
  }
}

Processor = container "Processing Service" {
  technology "Rust workers"
  scale {
    min 10
    max 200
    metric "queue_depth > 5000"
  }
}

Events = database "Event Store" {
  technology "Kafka"
  description "Buffers events for processing"
}

OutputDB = database "Output Database" {
  technology "ClickHouse"
  description "Stores processed events"
}

Ingest -> Events "Consumes"
Events -> Processor "Streams"
Processor -> OutputDB "Writes"
}

// Complete system view
view index {
title "Complete Pipeline"
include *
}

// Hot path view: Focus on critical throughput path
view hotpath {
title "Hot Path - Throughput Analysis"
include Pipeline.Ingest
include Pipeline.Events
include Pipeline.Processor
exclude Pipeline.OutputDB
}

// Backpressure view: Components that can cause bottlenecks
view backpressure {
title "Backpressure Points"
include Pipeline.Events
include Pipeline.Processor
exclude Pipeline.Ingest
exclude Pipeline.OutputDB
}

// Scale view: Components with scaling configuration
view scale {
title "Scaling Configuration"
include Pipeline.Ingest
include Pipeline.Processor
exclude Pipeline.Events
exclude Pipeline.OutputDB
}

Practice

  • Create a view highlighting backpressure points.
  • Annotate scale bounds for hot components.
  • Use scenarios to model high-volume flows.

Real Time


title: "Lesson 1: Design a Chat Application" weight: 1 summary: "WhatsApp: WebSockets, Pub/Sub, Message Persistence."

Lesson 1: Design a Chat Application

Goal: Design a real-time chat service like WhatsApp or Slack that supports 1-on-1 and Group messaging.

Requirements

Functional

  • Send/Receive messages in real-time.
  • See user status (Online/Offline).
  • Message history (persistent storage).

Non-Functional

  • Low Latency: Messages must appear instantly.
  • Consistency: Messages must be delivered in order.
  • Availability: High uptime.

Core Design

1. Communication Protocol

HTTP is request/response (pull). For chat, we need push.

  • WebSockets: Keeps a persistent connection open between client and server.

2. Message Flow

  • User A sends message to Chat Server.
  • Chat Server finds which server User B is connected to (using a Session Store like Redis).
  • Chat Server pushes message to User B.

3. Storage

  • Chat History: Write-heavy. Cassandra or HBase (Wide-column stores) are good for time-series data.
  • User Status: Key-Value store (Redis) with TTL.

🛠️ Sruja Perspective: Modeling Real-Time Flows

We can use Sruja to model the WebSocket connections and the async message processing.

import { * } from 'sruja.ai/stdlib'


requirement R1 functional "Real-time messaging"
requirement R2 functional "Message history"
requirement R3 latency "Instant delivery"
requirement R4 consistency "Ordered delivery"

ChatApp = system "WhatsApp Clone" {
    ChatServer = container "Chat Server" {
        technology "Node.js (Socket.io)"
        description "Handles WebSocket connections"
        scale {
            min 10
            max 100
            metric "connections > 10k"
        }
    }

    SessionStore = database "Session Store" {
        technology "Redis"
        description "Maps UserID -> WebSocketServerID"
    }

    MessageDB = database "Message History" {
        technology "Cassandra"
        description "Stores chat logs"
    }

    MessageQueue = queue "Message Queue" {
        technology "Kafka"
        description "Buffers messages for group chat fan-out"
    }

    ChatServer -> SessionStore "Reads/Writes"
    ChatServer -> MessageDB "Persists messages"
    ChatServer -> MessageQueue "Async processing"
}

UserA = person "Alice"
UserB = person "Bob"

// Scenario: 1-on-1 chat (user online)
scenario SendMessageOnline "Send Message - Recipient Online" {
    UserA -> ChatApp.ChatServer "Send 'Hello'"
    ChatApp.ChatServer -> ChatApp.MessageDB "Persist message"
    ChatApp.ChatServer -> ChatApp.SessionStore "Lookup Bob's connection"
    ChatApp.SessionStore -> ChatApp.ChatServer "Bob is on Server-2"
    ChatApp.ChatServer -> UserB "Push 'Hello' via WebSocket"
    UserB -> ChatApp.ChatServer "ACK received"
}

// Scenario: 1-on-1 chat (user offline)
scenario SendMessageOffline "Send Message - Recipient Offline" {
    UserA -> ChatApp.ChatServer "Send 'Hello'"
    ChatApp.ChatServer -> ChatApp.MessageDB "Persist message"
    ChatApp.ChatServer -> ChatApp.SessionStore "Lookup Bob's connection"
    ChatApp.SessionStore -> ChatApp.ChatServer "Bob is offline"
    ChatApp.ChatServer -> ChatApp.MessageDB "Mark as pending delivery"
}

// Scenario: Group chat (fan-out)
scenario SendGroupMessage "Send Group Message" {
    UserA -> ChatApp.ChatServer "Send 'Hello' to Group"
    ChatApp.ChatServer -> ChatApp.MessageDB "Persist message"
    ChatApp.ChatServer -> ChatApp.MessageQueue "Enqueue for fan-out"
    ChatApp.MessageQueue -> ChatApp.ChatServer "Process for each member"
    ChatApp.ChatServer -> ChatApp.SessionStore "Lookup each member's server"
    ChatApp.ChatServer -> UserB "Push to member 1"
    ChatApp.ChatServer -> UserC "Push to member 2"
    ChatApp.ChatServer -> UserD "Push to member 3"
}

// Scenario: Message history retrieval
scenario GetMessageHistory "Retrieve Message History" {
    UserA -> ChatApp.ChatServer "Request chat history"
    ChatApp.ChatServer -> ChatApp.MessageDB "Query messages"
    ChatApp.MessageDB -> ChatApp.ChatServer "Return messages"
    ChatApp.ChatServer -> UserA "Send history"
}

view index {
include *
}

Data Intensive


title: "Lesson 1: Design a Video Streaming Service" weight: 1 summary: "YouTube: Transcoding, CDNs, Adaptive Streaming."

Lesson 1: Design a Video Streaming Service

Goal: Design a video sharing platform like YouTube or Netflix where users can upload and watch videos.

Requirements

Functional

  • Upload videos.
  • Watch videos (streaming).
  • Support multiple resolutions (360p, 720p, 1080p).

Non-Functional

  • Reliability: No buffering.
  • Availability: Videos are always accessible.
  • Scalability: Handle millions of concurrent viewers.

Core Design

1. Storage (Blob Store)

Videos are large binary files (BLOBs). Databases are bad for this.

  • Object Storage: AWS S3, Google Cloud Storage.
  • Metadata: Store title, description, and S3 URL in a SQL/NoSQL DB.

2. Processing (Transcoding)

Raw uploads are huge. We need to convert them into different formats and resolutions.

  • Transcoding Service: Breaks video into chunks and encodes them (H.264, VP9).

3. Delivery (CDN)

Serving video from a single server is too slow for global users.

  • Content Delivery Network (CDN): Caches video chunks in edge servers close to the user.

4. Adaptive Bitrate Streaming (HLS/DASH)

The player automatically switches quality based on the user's internet speed.


🛠️ Sruja Perspective: Modeling Infrastructure

We can use Sruja's deployment nodes to visualize the global distribution of content.

import { * } from 'sruja.ai/stdlib'


YouTube = system "Video Platform" {
    WebApp = container "Web App"
    API = container "API Server"

    Transcoder = container "Transcoding Service" {
        description "Converts raw video to HLS format"
        scale { min 50 }
    }

    S3 = database "Blob Storage" {
        description "Stores raw and processed video files"
    }

    MetadataDB = database "Metadata DB"

    WebApp -> API "HTTPS"
    API -> MetadataDB "Reads/Writes"
    API -> S3 "Uploads"
    API -> Transcoder "Triggers"
    Transcoder -> S3 "Reads/Writes"
    Transcoder -> MetadataDB "Updates status"
}

// Deployment View
deployment GlobalInfra "Global Infrastructure" {
    node OriginDC "Origin Data Center" {
        containerInstance WebApp
        containerInstance API
        containerInstance Transcoder
        containerInstance S3
    }

    node CDN "CDN (Edge Locations)" "Cloudflare / Akamai" {
        // Represents cached content
        node USEast "US-East Edge"
        node Europe "Europe Edge"
        node Asia "Asia Edge"
    }
}

User = person "Viewer"

// Streaming Flow
scenario WatchVideo "User watches a video" {
    User -> WebApp "Get Video Page"
    WebApp -> API "Get Metadata (Title, URL)"
    API -> MetadataDB "Query"
    API -> User "Return Video Manifest URL"
    User -> CDN "Request Video Chunk (1080p)"
    CDN -> User "Stream Chunk"
}

// Upload Flow
scenario UploadVideo "Creator uploads a video" {
    User -> YouTube.WebApp "Upload Raw Video"
    YouTube.WebApp -> YouTube.API "POST /upload"
    YouTube.API -> YouTube.S3 "Store Raw Video"
    YouTube.API -> YouTube.Transcoder "Trigger Transcoding Job"
    YouTube.Transcoder -> YouTube.S3 "Read Raw / Write HLS"
    YouTube.Transcoder -> YouTube.MetadataDB "Update Video Status"
}

// Data flow: Video transcoding pipeline
flow TranscodingPipeline "Video Transcoding Data Flow" {
    YouTube.S3 -> YouTube.Transcoder "Streams raw video chunks"
    YouTube.Transcoder -> YouTube.Transcoder "Encodes to HLS (360p, 720p, 1080p)"
    YouTube.Transcoder -> YouTube.S3 "Writes encoded chunks"
    YouTube.Transcoder -> YouTube.MetadataDB "Updates manifest URLs"
}

// Data flow: Video delivery pipeline
flow DeliveryPipeline "Video Delivery Data Flow" {
    YouTube.S3 -> CDN "Replicates video chunks to edge"
    CDN -> User "Streams chunks on demand"
    User -> CDN "Requests next chunk based on bandwidth"
    CDN -> YouTube.S3 "Cache miss: fetch from origin"
}

// Data flow: Analytics pipeline
flow AnalyticsPipeline "Video Analytics Data Flow" {
    YouTube.WebApp -> YouTube.API "Sends view events"
    YouTube.API -> YouTube.MetadataDB "Updates view count"
    YouTube.API -> YouTube.MetadataDB "Stores watch time"
    YouTube.MetadataDB -> YouTube.API "Aggregates analytics"
}

view index {
title "Complete Video Platform"
include *
}

// Data flow view: Focus on data pipelines
view dataflow {
title "Data Flow View"
include YouTube.Transcoder YouTube.S3 YouTube.MetadataDB
exclude YouTube.WebApp YouTube.API
description "Shows data transformation and storage flows"
}

// Processing view: Transcoding pipeline
view processing {
title "Processing Pipeline"
include YouTube.Transcoder YouTube.S3
exclude YouTube.WebApp YouTube.API YouTube.MetadataDB
description "Focuses on video processing components"
}

Consistency


title: "Lesson 1: Design a Distributed Counter" weight: 1 summary: "Sharding, Write-Behind, Eventual Consistency."

Lesson 1: Design a Distributed Counter

Goal: Design a system to count events (e.g., YouTube views, Facebook likes) at a massive scale (e.g., 1 million writes/sec).

The Problem with a Single Database

A standard SQL database (like PostgreSQL) can handle ~2k-5k writes/sec. If we try to update a single row (UPDATE videos SET views = views + 1 WHERE id = 123) for every view, the database will lock the row and become a bottleneck.

Solutions

1. Sharding (Write Splitting)

Instead of one counter, have $N$ counters for the same video.

  • Randomly pick a counter from $1$ to $N$ and increment it.
  • Total Views = Sum of all $N$ counters.

2. Write-Behind (Batching)

Don't write to the DB immediately.

  • Store counts in memory (Redis) or a log (Kafka).
  • A background worker aggregates them and updates the DB every few seconds.
  • Trade-off: If the server crashes before flushing, you lose a few seconds of data (Eventual Consistency).

🛠️ Sruja Perspective: Modeling Write Flows

We can use Sruja to model the "Write-Behind" architecture.

import { * } from 'sruja.ai/stdlib'


CounterService = system "View Counter" {
    API = container "Ingestion API" {
        technology "Rust"
        description "Receives 'view' events"
    }

    EventLog = queue "Kafka" {
        description "Buffers raw view events"
    }

    Worker = container "Aggregator" {
        technology "Python"
        description "Reads batch of events, sums them, updates DB"
        scale { min 5 }
    }

    DB = database "Counter DB" {
        technology "Cassandra"
        description "Stores final counts (Counter Columns)"
    }

    Cache = container "Read Cache" {
        technology "Redis"
        description "Caches total counts for fast reads"
    }

    API -> EventLog "Produces events"
    Worker -> EventLog "Consumes events"
    Worker -> DB "Updates counts"
    Worker -> Cache "Updates cache"
}

User = person "Viewer"

// Write Path (Eventual Consistency)
TrackView = scenario "User watches a video" {
    User -> CounterService.API "POST /view"
    CounterService.API -> CounterService.EventLog "Produce Event"
    CounterService.API -> User "202 Accepted"

    // Async processing
    CounterService.EventLog -> CounterService.Worker "Consume Batch"
    CounterService.Worker -> CounterService.DB "UPDATE views += batch_size"
    CounterService.Worker -> CounterService.Cache "Invalidate/Update"
}

view index {
include *
}

Lesson 2


title: "Lesson 2: Consistency via Constraints & Conventions" weight: 2 summary: "Use constraints and conventions to manage consistency trade‑offs."

Lesson 2: Consistency via Constraints & Conventions

Why Constraints?

They document trade‑offs and prevent accidental coupling across services.

Sruja: Guardrails for Consistency

import { * } from 'sruja.ai/stdlib'


constraints {
rule "No cross‑service transactions"
rule "Idempotent event handlers"
}
conventions {
naming "kebab-case"
retries "Exponential backoff (max 3)"
}

view index {
include *
}

Practice

  • Add constraints that support your chosen consistency model.
  • Capture conventions for retries, idempotency, and naming.

Ecommerce Platform


title: "Ecommerce Platform: Architecture and Operations" summary: "From vision and basics to architecture tech, SDLC, operations, evolution, and governance." weight: 1

Ecommerce Platform: Architecture and Operations

Overview

  • End-to-end view of building and operating a modern ecommerce platform
  • Balances product vision with technical architecture and operational excellence

Learning Goals

  • Model domain entities, flows, and services for ecommerce
  • Design modular architecture and platform capabilities
  • Plan SDLC, release management, and operational readiness
  • Govern changes with policies, SLOs, and compliance

Prerequisites

  • Familiarity with web services, data stores, messaging, and CI/CD

Course Structure

  • Module 1: Vision & Basics
  • Module 3: Architecture & Tech
  • Module 4: SDLC
  • Module 5: Ops
  • Module 6: Evolution
  • Module 7: Governance

Where to Start

  • Begin with Module 1 to align vision and fundamentals, then proceed in order

Vision Basics


title: "Module Overview: Vision & Basics" weight: 0 summary: "Goals, outcomes, and a quick checklist for Module 1."

Module Overview: Vision & Basics

Goals:

  • Define project vision and scope
  • Identify key systems and stakeholders
  • Draft initial architecture map

Estimated time: 45–60 minutes

Checklist:

  • Vision statement
  • Stakeholders listed
  • Initial system map in Sruja

Lesson 1


title: "Lesson 1: Introduction to the Project" weight: 1 summary: "Defining the scope of our Shopify-lite platform."

Lesson 1: Introduction to the Project

In this course, we are building Shopify-lite. Let's define what that means.

The Concept

We are building a multi-tenant e-commerce platform. This means a single instance of our software will serve multiple different online stores (tenants), each with their own products, orders, and customers.

Core Capabilities

Our system must support:

  1. Storefronts: Fast, SEO-friendly pages for browsing products.
  2. Admin Dashboard: Where merchants manage their inventory.
  3. Checkout: A secure, reliable way to take money.
  4. Inventory Management: Real-time stock tracking to prevent overselling.

The "Why"

Why build this? Because it touches on every hard problem in distributed systems:

  • Consistency: Inventory must be accurate.
  • Availability: Checkout must never go down.
  • Scalability: We need to handle flash sales.
  • Security: We are handling credit card data (PCI Compliance).

The Role of Sruja

Most tutorials start by running npx create-next-app. We will not do that yet.

We will start by creating a sruja file. Why? Because we need to agree on the structure before we get lost in the details. Sruja will be our shared whiteboard, our documentation, and our validator.

Lesson 2


title: "Lesson 2: Setting up the Workspace" weight: 2 summary: "Initializing the project structure and creating the first architecture file with product requirements."

Lesson 2: Setting up the Workspace

Let's get our hands dirty. We will set up a professional project structure that separates our architectural definitions from our implementation code, and aligns with product requirements.

Real-World Scenario: Starting a New Product

Context: You're building Shopify-lite, a multi-tenant e-commerce platform. Before writing code, you need to:

  • Align engineering, product, and DevOps on the architecture
  • Document requirements alongside the design
  • Set up a structure that scales as the team grows

Product team needs: Clear documentation of what we're building and why.

Engineering team needs: Technical architecture that supports product goals.

DevOps team needs: Deployment and operational considerations from day one.

1. Directory Structure

Create a new directory for your project:

mkdir shopify-lite
cd shopify-lite

We will use the following structure (based on real-world best practices):

shopify-lite/
├── architecture/          # Sruja files live here
│   ├── main.sruja        # Main architecture
│   ├── requirements.sruja # Product requirements
│   └── deployment.sruja   # Deployment architecture
├── src/                   # Source code (Rust, Node, etc.)
├── docs/                  # Generated documentation
│   └── architecture.md    # Auto-generated from Sruja
├── .github/
│   └── workflows/
│       └── validate-architecture.yml  # CI/CD validation
└── README.md

Why this structure?

  • Separation of concerns: Architecture separate from code
  • Version control: Track architecture changes over time
  • CI/CD ready: Easy to integrate validation
  • Team collaboration: Product, engineering, and DevOps can all contribute

2. Installing Sruja

If you haven't already, install the Sruja CLI:

# Quick install
curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash

# Or from source: cargo install --path crates/sruja-cli (from repo root)

# Verify installation
sruja --version

For DevOps: Add to your CI/CD pipeline (we'll cover this in Module 5).

3. Hello World: The Context View

Create your first file at architecture/main.sruja. We'll start with a high-level Context View to define the boundaries of our system.

Product Requirements First

Before modeling architecture, let's capture product requirements:

import { * } from 'sruja.ai/stdlib'


// Product Requirements (from product team)
requirement R1 functional "Merchants can create and manage online stores"
requirement R2 functional "Shoppers can browse products and make purchases"
requirement R3 functional "Platform processes payments securely"
requirement R4 nonfunctional "Platform must support 10,000+ stores"
requirement R5 nonfunctional "Checkout must complete in < 3 seconds"
requirement R6 nonfunctional "99.9% uptime SLA"

// Business Goals (for product/executive alignment)
metadata {
    businessGoal "Enable small businesses to sell online"
    targetMarket "Small to medium businesses (SMBs)"
    successMetrics "Number of active stores, GMV (Gross Merchandise Value)"
}

view index {
include *
}

The Architecture Context

Now let's model the system context:

import { * } from 'sruja.ai/stdlib'


// Product Requirements
requirement R1 functional "Merchants can create and manage online stores"
requirement R2 functional "Shoppers can browse products and make purchases"
requirement R3 functional "Platform processes payments securely"
requirement R4 nonfunctional "Platform must support 10,000+ stores"
requirement R5 nonfunctional "Checkout must complete in < 3 seconds"
requirement R6 nonfunctional "99.9% uptime SLA"

// 1. The System
Platform = system "E-Commerce Platform" {
    description "The core multi-tenant e-commerce engine that enables merchants to create stores and shoppers to make purchases."

    // Link to requirements
    requirement R1
    requirement R2
    requirement R3
    requirement R4
    requirement R5
    requirement R6
}

// 2. The Users (from product personas)
Merchant = person "Store Owner" {
    description "Small business owner who creates and manages their online store"
}
Shopper = person "Customer" {
    description "End customer who browses products and makes purchases"
}

// 3. External Systems (from product integrations)
Stripe = system "Payment Gateway" {
    external
    description "Third-party payment processor (PCI-compliant)"
}

EmailService = system "Email Service" {
    tags ["external"]
    description "Sends transactional emails (order confirmations, etc.)"
}

// 4. High-Level Interactions (user journeys)
Merchant -> Platform "Manages Store" {
    description "Creates products, manages inventory, views analytics"
}
Shopper -> Platform "Browses & Buys" {
    description "Browses products, adds to cart, completes checkout"
}
Platform -> Stripe "Processes Payments" {
    description "Secure payment processing for customer orders"
}
Platform -> EmailService "Sends Notifications" {
    description "Order confirmations, shipping updates"
}

// 5. Model user journeys as scenarios
ShopperCheckout = scenario "Shopper Checkout Journey" {
    Shopper -> Platform "Browses products"
    Shopper -> Platform "Adds items to cart"
    Shopper -> Platform "Initiates checkout"
    Platform -> Stripe "Processes payment"
    Stripe -> Platform "Confirms payment"
    Platform -> EmailService "Sends order confirmation"
    EmailService -> Shopper "Delivers confirmation email"
}

MerchantManagement = scenario "Merchant Store Management" {
    Merchant -> Platform "Logs into admin dashboard"
    Merchant -> Platform "Creates new product"
    Merchant -> Platform "Updates inventory"
    Merchant -> Platform "Views sales analytics"
}

// Executive view: Business context
view executive {
title "Executive Overview"
include Merchant
include Shopper
include Platform
include Stripe
include EmailService
}

// Product view: User journeys
view product {
title "Product View - User Experience"
include Merchant
include Shopper
include Platform
exclude Stripe
exclude EmailService
}

// Technical view: System integrations
view technical {
title "Technical View - System Integration"
include Platform Stripe EmailService
exclude Merchant Shopper
}

// Default view: Complete system
view index {
title "Complete System View"
include *
}

Why This Approach?

For Product Teams:

  • Requirements are visible and linked to architecture
  • Business goals are documented
  • Success metrics are clear

For Engineering:

  • Architecture shows what to build
  • Requirements guide implementation priorities
  • External dependencies are identified early

For DevOps:

  • Uptime SLA (R6) informs infrastructure planning
  • Performance requirements (R5) guide monitoring setup
  • Scale requirements (R4) inform capacity planning

4. Visualize It

Run the Sruja CLI to visualize your architecture:

# View the architecture diagram
sruja view architecture/main.sruja

# Or export to different formats
sruja export markdown architecture/main.sruja > docs/architecture.md
sruja export json architecture/main.sruja > docs/architecture.json

You should see a clean diagram showing:

  • Your platform in the center
  • Users (Merchant, Shopper) on the left
  • External systems (Stripe, EmailService) on the right
  • Interactions between them

5. Validate Your Architecture

Before moving forward, validate your architecture:

# Lint for errors
sruja lint architecture/main.sruja

# Check for orphan elements
sruja tree architecture/main.sruja

Common issues to watch for:

  • Missing relations (orphan elements)
  • Invalid references
  • Unclear descriptions

6. Set Up CI/CD (DevOps Best Practice)

Create .github/workflows/validate-architecture.yml:

name: Validate Architecture

on:
  push:
    paths:
      - "architecture/**"
  pull_request:
    paths:
      - "architecture/**"

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Sruja
        run: |
          curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
          echo "$HOME/go/bin" >> $GITHUB_PATH
      - name: Validate Architecture
        run: sruja lint architecture/main.sruja
      - name: Generate Docs
        run: |
          sruja export markdown architecture/main.sruja > docs/architecture.md

Why this matters: Catches architecture errors before they reach production.

Key Takeaways

  1. Start with requirements: Document what you're building and why
  2. Model context first: Understand system boundaries before diving into details
  3. Link requirements to architecture: Show how architecture supports product goals
  4. Set up CI/CD early: Automate validation from day one
  5. Think about all stakeholders: Product, engineering, and DevOps all need different views

Exercise: Create Your Context View

Tasks:

  1. Create a new project directory
  2. Install Sruja CLI
  3. Create architecture/main.sruja with:
    • At least 3 product requirements
    • System context (your system, users, external systems)
    • High-level interactions
  4. Validate and visualize your architecture
  5. (Optional) Set up CI/CD validation

Time: 15 minutes

Further Reading

Architecture Tech


title: "Module Overview: Architecture & Tech" weight: 0 summary: "Model the architecture and pick enabling technologies."

Module Overview: Architecture & Tech

Goals:

  • Model systems, containers, and key relations
  • Select technology stack per container
  • Capture ADRs for major choices

Estimated time: 60–90 minutes

Checklist:

  • C4 L1/L2 map
  • Container tech set
  • ADR drafts

Lesson 1


title: "Lesson 1: Monolith vs. Microservices (ADRs)" weight: 1 summary: "Using Architecture Decision Records to document critical choices."

Lesson 1: Monolith vs. Microservices

This is the most debated topic in software engineering. Should we start with a monolith or microservices?

The Trade-off

  • Monolith: Easier to develop, deploy, and debug. Harder to scale teams and components independently.
  • Microservices: Independent scaling and deployment. High complexity (network, consistency, observability).

Our Decision: Modular Monolith

For Shopify-lite, we will start with a Modular Monolith. We will have clear boundaries (modules) but deploy as a single unit initially. This gives us speed now and flexibility later.

Documenting with ADRs

We don't just make this decision; we document it so future engineers know why.

import { * } from 'sruja.ai/stdlib'


// Requirements that drive the architecture decision
requirement R1 functional "Must support 10,000+ stores"
requirement R2 performance "API response < 200ms p95"
requirement R3 scalability "Scale components independently"
requirement R4 development "Small team, need fast iteration"

// Architecture Decision Record
adr ArchitectureStyle "Modular Monolith Strategy" {
    status "Accepted"
    context "We are a small team building a new product. Speed is critical, but we need to scale to 10k+ stores."

    option "Microservices" {
        pros "Independent scaling, technology diversity"
        cons "High operational complexity, network latency, data consistency challenges"
    }

    option "Monolith" {
        pros "Simplest deployment, no network calls"
        cons "Cannot scale components independently, single point of failure"
    }

    option "Modular Monolith" {
        pros "Simple deployment, code sharing, clear boundaries"
        cons "Risk of tight coupling if not disciplined, harder to scale independently later"
    }

    decision "Modular Monolith"
    reason "We prioritize iteration speed for MVP. We will enforce boundaries using Sruja domains and can split to microservices later if needed."
    consequences "Faster initial development, may need refactoring to microservices at scale"
}

// Architecture that implements the decision
Platform = system "E-Commerce Platform" {
    description "Modular monolith - single deployment with clear module boundaries"
    adr ArchitectureStyle

    // Modules as containers (can be split to microservices later)
    StorefrontModule = container "Storefront Module" {
        technology "Next.js"
        description "Handles product browsing and storefronts"
    }

    AdminModule = container "Admin Module" {
        technology "Next.js"
        description "Merchant admin dashboard"
    }

    APIModule = container "API Module" {
        technology "Rust"
        description "Core business logic - can scale independently"
        scale {
            min 3
            max 50
            metric "cpu > 70%"
        }
    }

    OrderDB = database "Order Database" {
        technology "PostgreSQL"
        description "Stores orders and transactions"
    }

    StorefrontModule -> APIModule "Fetches product data"
    AdminModule -> APIModule "Manages inventory"
    APIModule -> OrderDB "Reads/Writes"
}

view index {
title "Platform Architecture Overview"
include *
}

// Module view: Show module boundaries
view modules {
title "Module View"
include Platform.StorefrontModule Platform.AdminModule Platform.APIModule Platform.OrderDB
}

// Scalability view: Focus on scalable components
view scalability {
title "Scalability View"
include Platform.APIModule Platform.OrderDB
}

Lesson 2


title: "Lesson 2: Selecting the Stack" weight: 2 summary: "Choosing technologies and documenting them in Containers."

Lesson 2: Selecting the Stack

We have our domains. Now we need to pick the tools to build them.

The Stack

  1. Frontend: Next.js (React) - Great for SEO and performance.
  2. Backend: Rust - Performance, safety, and great tooling for services.
  3. Database: PostgreSQL - Reliable, ACID compliant (critical for money).

Modeling in Sruja

We define these choices in our container definitions.

import { * } from 'sruja.ai/stdlib'


Platform = system "E-Commerce Platform" {
WebApp = container "Storefront & Admin" {
  technology "Next.js, TypeScript"
  description "The user-facing application."
}

API = container "Core API" {
  technology "Rust, Axum"
  description "REST API handling business logic."
}

Database = container "Primary DB" {
  technology "PostgreSQL 15"
  description "Stores orders, products, and users."
}

WebApp -> API "JSON/HTTPS"
API -> Database "SQL/TCP"
}

By documenting technology, we make it clear to new developers what skills they need.

Lesson 3


title: "Lesson 3: API-First Design" weight: 3 summary: "Design-first API development using OpenAPI and Sruja together."

Lesson 3: API-First Design

Before frontend and backend teams start working, they need to agree on the API. This is where API-First Design comes in.

API-First Design

Instead of writing code and then documenting it, we define the API schema first using OpenAPI. This allows frontend devs to mock the API while backend devs build it.

Sruja's Role: Architecture Modeling

Sruja models which services exist and how they connect. For detailed API schemas (endpoints, request/response structures), use OpenAPI/Swagger.

import { * } from 'sruja.ai/stdlib'


customer = person "Customer"

ecommerce = system "E-Commerce Platform" {
  api = container "Core API" {
    technology "Rust, Axum"
    // API schemas defined in openapi.yaml
  }

  orderDB = database "Order Database" {
    technology "PostgreSQL"
  }

  api -> orderDB "reads and writes to"
}

customer -> ecommerce.api "uses"

view index {
  title "E-Commerce Architecture"
  include *
}

Best Practice: Separation of Concerns

  1. Sruja: Models architecture (services, containers, relationships)
  2. OpenAPI: Defines API schemas (endpoints, request/response structures)
  3. Together: Architecture shows the big picture, OpenAPI shows the details

Why this matters

  1. Right tool for the job: Architecture modeling vs. API specification
  2. Industry standard: OpenAPI is widely supported by tools and frameworks
  3. Code Generation: Generate Rust types, TypeScript interfaces, and client SDKs from OpenAPI

Lesson 4


title: "Lesson 4: API Design & Integration" weight: 4 summary: "Design stable APIs and integrate with external services."

Lesson 4: API Design & Integration

Why API Design Matters

Well-designed APIs define stable interfaces between services; they reduce coupling and surprises. However, API schemas belong in OpenAPI, not in Sruja.

Sruja's Role: Architecture Modeling

Sruja focuses on architectural concerns: which services exist, how they relate, and what they do. For detailed API schemas, use OpenAPI/Swagger.

import { * } from 'sruja.ai/stdlib'


customer = person "Customer"

ecommerce = system "E-Commerce Platform" {
  api = container "Checkout API" {
    technology "Rust, Axum"
    // API details defined in openapi.yaml
  }

  events = queue "Order Events" {
    technology "Kafka"
    // Event schemas defined in AsyncAPI or JSON Schema
  }
}

customer -> ecommerce.api "uses"

Best Practice

  1. Model architecture in Sruja: Services, containers, relationships
  2. Define API schemas in OpenAPI: Request/response structures, endpoints
  3. Link them: Reference OpenAPI files in your architecture documentation

Practice

  • Model the AddToCart service in Sruja
  • Create an OpenAPI spec for the AddToCart endpoint
  • Show how they work together

SDLC


title: "Lesson 1: The Local Loop" weight: 1 summary: "Using Sruja for local development and testing."

Lesson 1: The Local Loop

How do you use Sruja while you code?

1. The Blueprint

Keep architecture/main.sruja open in a split pane. It is your map. Before you create a new file or function, verify where it fits in the architecture.

2. Generating Boilerplate (Future)

Imagine running sruja gen to scaffold your Rust microservices (or services in other languages) based on your container definitions. While this feature is in development, you can manually align your folder structure to your architecture.

src/
  orders/      # Matches 'container OrderService'
  inventory/   # Matches 'container InventoryService'

3. Local Validation

Before you commit, run:

sruja validate .

This checks for:

  • Orphans: Components defined but never used.
  • Broken Links: Relations pointing to non-existent elements.
  • Policy Violations: Did you accidentally introduce a circular dependency?

Lesson 2


title: "Lesson 2: Environments (Deployment Nodes)" weight: 2 summary: "Modeling Dev, Staging, and Production environments."

Lesson 2: Environments

Your software runs differently in Production than it does on your laptop. Sruja models this using Deployment Nodes.

Modeling Production

deployment Production "AWS Production" {
    node Region "US-East-1" {
        node K8s "EKS Cluster" {
            containerInstance WebApp
            containerInstance API
        }
        
        node DB "RDS Postgres" {
            containerInstance Database
        }
    }
}

Modeling Local Dev

deployment Local "Docker Compose" {
    node Laptop "My MacBook" {
        containerInstance WebApp
        containerInstance API
        containerInstance Database
    }
}

Why model this?

It helps you visualize the physical differences. Maybe in Prod you have a Load Balancer that doesn't exist locally. Sruja makes these differences explicit.

Lesson 3


title: "Lesson 3: CI/CD Pipeline (Validation)" weight: 3 summary: "Automating architectural checks in your pipeline."

Lesson 3: CI/CD Pipeline

Architecture compliance shouldn't be a manual review process. It should be a build step.

The Pipeline

In your GitHub Actions or Jenkins pipeline, add a step to install and run Sruja.

steps:
  - name: Checkout
    uses: actions/checkout@v3

  - name: Install Sruja
    run: curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash

  - name: Validate Architecture
    run: sruja validate architecture/

Breaking the Build

If a developer introduces a violation (e.g., "Frontend talks directly to Database"), sruja validate will exit with a non-zero code, failing the build.

This is Governance as Code. You stop architectural drift before it merges.

Ops


title: "Lesson 1: Deployment Strategies" weight: 1 summary: "Production-ready deployment strategies: Blue/Green, Canary, and real-world patterns."

Lesson 1: Deployment Strategies

Real-World Problem: Black Friday Deployment

Scenario: You need to deploy a critical payment fix on Black Friday morning. The system handles $10M/hour in transactions. How do you deploy without risking downtime?

Wrong approach: Deploy directly to production and hope nothing breaks.

Right approach: Use a proven deployment strategy that minimizes risk.

Why Deployment Strategies Matter

Industry statistics:

  • 60% of outages are caused by bad deployments (Gartner, 2023)
  • Average cost of downtime: $5,600/minute for large enterprises
  • 99.9% uptime = 8.76 hours downtime/year (still too much for critical systems)

Product team perspective: Every minute of downtime means lost revenue, frustrated customers, and damaged reputation.

DevOps perspective: Need automated, repeatable, safe deployment processes.

Blue/Green Deployment

Concept

You have two identical environments (Blue and Green). One is live, the other is idle. You deploy to the idle one, test it, and then switch traffic.

Real-World Example: E-Commerce Platform

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
    API = container "REST API" {
        technology "Rust"
        scale {
            min 10
            max 200
        }
    }
    PaymentService = container "Payment Service" {
        technology "Rust"
        description "Critical: Processes all payments"
    }
    OrderDB = database "Order Database" {
        technology "PostgreSQL"
    }
}

deployment Production "Production Environment" {
    node Blue "Active Cluster (Blue)" {
        containerInstance API {
            replicas 50
            traffic 100
            status "active"
        }
        containerInstance PaymentService {
            replicas 20
            traffic 100
        }
        containerInstance OrderDB {
            role "primary"
        }
    }

    node Green "Staging Cluster (Green)" {
        containerInstance API {
            replicas 50
            traffic 0
            status "ready"
        }
        containerInstance PaymentService {
            replicas 20
            traffic 0
            status "ready"
        }
        containerInstance OrderDB {
            role "standby"
            description "Synced from Blue, ready for switch"
        }
    }
}

view index {
include *
}

DevOps Workflow

  1. Deploy to Green: Deploy new version to idle Green environment
  2. Smoke Tests: Run automated health checks and integration tests
  3. Load Testing: Verify Green can handle production load
  4. Switch Traffic: Use load balancer to route 100% traffic to Green
  5. Monitor: Watch metrics for 30 minutes
  6. Rollback Plan: Keep Blue ready for instant rollback if issues occur

When to Use Blue/Green

Good for:

  • Critical services (payment, authentication)
  • Stateful applications with database replication
  • Zero-downtime requirements
  • Large, infrequent deployments

Not ideal for:

  • Frequent small deployments (wasteful)
  • Stateless services (Canary is better)
  • Limited infrastructure budget

Cost Consideration

Example: Running duplicate production environment

  • Cost: 2x infrastructure during deployment window
  • Typical window: 1-2 hours
  • Trade-off: Higher cost for lower risk

Canary Deployment

Concept

You roll out the new version to a small percentage of users (e.g., 5%) and monitor for errors. Gradually increase if metrics look good.

Real-World Example: API Service

import { * } from 'sruja.ai/stdlib'


API = system "REST API" {
    APIv1 = container "API v1.2.3" {
        technology "Rust"
        description "Current stable version"
    }
    APIv2 = container "API v1.2.4" {
        technology "Rust"
        description "New version with performance improvements"
    }
}

deployment Production "Production Environment" {
    node Canary "Canary Cluster" {
        containerInstance APIv2 {
            replicas 2
            traffic 5
            description "5% of traffic, monitoring error rate"
            metadata {
                maxErrorRate "1%"
                rollbackTrigger "error_rate > 1% or latency_p95 > 500ms"
            }
        }
    }

    node Stable "Stable Cluster" {
        containerInstance APIv1 {
            replicas 38
            traffic 95
        }
    }
}

view index {
include *
}

Gradual Rollout Strategy

Document the rollout plan in metadata:

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce Platform" {
API = container "API Service" {
  metadata {
    deploymentStrategy "Canary"
    rolloutSteps "5% → 25% → 50% → 100%"
    stepDuration "15 minutes per step"
    monitoringWindow "15 minutes between steps"
    rollbackCriteria "error_rate > 1% OR latency_p95 > 500ms OR cpu > 90%"
  }
}
}

view index {
include *
}

Real-World Rollout Timeline

Example: Deploying new API version

10:00 AM - Deploy to Canary (5% traffic)
10:15 AM - Monitor: Error rate 0.2%, Latency p95: 180ms ✅
10:15 AM - Increase to 25% traffic
10:30 AM - Monitor: Error rate 0.3%, Latency p95: 195ms ✅
10:30 AM - Increase to 50% traffic
10:45 AM - Monitor: Error rate 0.4%, Latency p95: 210ms ✅
10:45 AM - Increase to 100% traffic
11:00 AM - Deployment complete

When to Use Canary

Good for:

  • Stateless services
  • Frequent deployments (multiple per day)
  • A/B testing new features
  • Performance-sensitive changes
  • Limited infrastructure budget

Not ideal for:

  • Database schema changes (requires coordination)
  • Breaking API changes (incompatible versions)
  • Services with complex state

Rolling Deployment

Concept

Gradually replace old instances with new ones, one at a time.

deployment Production "Production Environment" {
    node Cluster "Kubernetes Cluster" {
        containerInstance API {
            replicas 20
            strategy "rolling"
            maxUnavailable 1
            maxSurge 2
            description "Replace 1 pod at a time, max 1 unavailable"
        }
    }
}

When to Use Rolling

Good for:

  • Kubernetes-native deployments
  • Stateless microservices
  • Cost-effective (no duplicate infrastructure)
  • Automated rollback via health checks

Feature Flags: Deployment Strategy Alternative

Sometimes you don't need a deployment strategy—use feature flags instead:

import { * } from 'sruja.ai/stdlib'


Platform = system "Platform" {
FeatureFlags = container "Feature Flag Service" {
  technology "LaunchDarkly, Split.io"
  description "Controls feature rollout without deployment"
}

API = container "API Service" {
  // Feature flags: newPaymentFlow (10% rollout), experimentalSearch (5% rollout)
}
}

view index {
include *
}

Use case: Deploy code with new feature disabled, then gradually enable via feature flags.

Monitoring During Deployment

Model your observability during deployments:

import { * } from 'sruja.ai/stdlib'


Observability = system "Observability Stack" {
Prometheus = container "Metrics" {
  description "Tracks error rate, latency, throughput during deployment"
}
AlertManager = container "Alerting" {
  description "Alerts on deployment issues"
}
}

// Link monitoring to deployment
deployment Production "Production Environment" {
    // Monitoring: error_rate, latency_p95, cpu_usage, request_rate
    // Alert thresholds: errorRate > 1%, latencyP95 > 500ms, cpuUsage > 90%
    // Rollback automation enabled
}

Real-World Case Study: Netflix Canary Deployment

Challenge: Deploy to 100M+ users without downtime

Solution:

  • Canary deployment to 1% of users
  • Automated analysis of 50+ metrics
  • Automatic rollback if any metric degrades
  • Gradual rollout over 6 hours

Result: 99.99% deployment success rate

Key Takeaways

  1. Choose the right strategy: Blue/Green for critical, Canary for frequent, Rolling for cost-effective
  2. Automate everything: Use CI/CD pipelines to automate deployment and rollback
  3. Monitor aggressively: Track error rates, latency, and resource usage during deployment
  4. Have a rollback plan: Always be ready to rollback within minutes
  5. Document in Sruja: Model your deployment strategy so teams understand the process

Exercise: Design a Deployment Strategy

Scenario: You're deploying a new checkout flow for an e-commerce platform. The system processes $1M/hour.

Tasks:

  1. Choose a deployment strategy (Blue/Green, Canary, or Rolling)
  2. Model it in Sruja with deployment nodes
  3. Add monitoring and rollback criteria
  4. Document the rollout timeline

Time: 15 minutes

Further Reading

Lesson 2


title: "Lesson 2: Debugging Performance (Structural Analysis)" weight: 2 summary: "Using your architecture diagram to find bottlenecks."

Lesson 2: Debugging Performance

Scenario: It's Black Friday. The "Checkout" page is loading in 5 seconds. Why?

The Wrong Way

Start reading random logs or guessing which database query is slow.

The Structural Way

Look at your Sruja User Journey for "Purchase".

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

Platform = system "E-Commerce Platform" {
Checkout = container "Checkout Service"
PaymentWorker = container "Payment Worker"
PaymentQueue = queue "Payment Jobs"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

// Original synchronous flow (problematic)
Purchase = story "User Purchase Flow" {
Customer -> Platform.Checkout "Initiates checkout"
Platform.Checkout -> PaymentGateway "Process Payment" {
  latency "2s"
}
PaymentGateway -> Customer "Returns confirmation"
}

Wait, the PaymentGateway call is synchronous and takes 2 seconds? And it's in the critical path of the user request?

Root Cause: We are blocking the user while waiting for the bank.

The Fix: Asynchronous Processing

We need to decouple the user request from the payment processing.

  1. Introduce a Queue: The Checkout service puts a message on a queue.
  2. Worker: A background worker processes the payment.
  3. Update: The frontend polls for status or uses WebSockets.

Let's update the architecture:

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

Platform = system "E-Commerce Platform" {
Checkout = container "Checkout Service"
PaymentWorker = container "Payment Worker"
PaymentQueue = queue "Payment Jobs"
}

PaymentGateway = system "Payment Gateway" {
metadata {
    tags ["external"]
  }
}

// Updated asynchronous flow
Customer -> Platform.Checkout "Initiates checkout"
Platform.Checkout -> Platform.PaymentQueue "Enqueues payment job" {
latency "10ms"
}
Platform.PaymentQueue -> Platform.PaymentWorker "Processes async"
Platform.PaymentWorker -> PaymentGateway "Processes payment"
PaymentGateway -> Customer "Sends confirmation email"

// Updated scenario
PurchaseAsync = story "Asynchronous Purchase Flow" {
Customer -> Platform.Checkout "Initiates checkout"
Platform.Checkout -> Platform.PaymentQueue "Enqueues job" {
  latency "10ms"
}
Platform.PaymentQueue -> Platform.PaymentWorker "Processes async"
Platform.PaymentWorker -> PaymentGateway "Processes payment"
PaymentGateway -> Customer "Sends confirmation"
}

view index {
title "Payment Processing Architecture"
include *
}

// Performance view: Focus on async flow
view performance {
title "Performance View - Async Processing"
include Platform.Checkout Platform.PaymentQueue Platform.PaymentWorker PaymentGateway
exclude Customer
}

By visualizing the flow, the bottleneck (and the fix) becomes obvious.

Lesson 3


title: "Lesson 3: Observability" weight: 3 summary: "Mapping metrics and logs to your architecture."

Lesson 3: Observability

You can't fix what you can't see. Observability is about understanding the internal state of your system from the outside.

The Three Pillars

  1. Logs: "What happened?" (Error: Payment Failed)
  2. Metrics: "How often?" (Error Rate: 5%)
  3. Traces: "Where?" (Checkout -> API -> DB)

Mapping to Sruja

Your Sruja components should map 1:1 to your observability dashboards.

  • System OrderService -> Dashboard Order Service Overview
  • Container Database -> Metric postgres_cpu_usage

Standardizing with Policies

You can enforce observability standards using Sruja Policies.

policy Observability "Must have metrics" {
    rule "HealthCheck" {
        check "all containers must have health check endpoint"
    }
}

This ensures every new service you build comes with the necessary hooks for monitoring.

Lesson 4


title: "Lesson 4: Ops SLOs & Monitoring" weight: 4 summary: "Set SLO targets and align alerts and dashboards."

Lesson 4: Ops SLOs & Monitoring

SLOs in Ops

Translate business expectations into measurable targets; build dashboards around them.

Sruja: Define SLOs

import { * } from 'sruja.ai/stdlib'


slo {
availability { target "99.9%" window "30 days" }
latency { p95 "200ms" window "7 days" }
errorRate { target "< 0.1%" window "30 days" }
throughput { target "1000 req/s" window "1 hour" }
}

view index {
include *
}

Practice

  • Set p95 latency targets for checkout and search.
  • Map alerts to SLO windows; define runbooks for breaches.

Evolution


title: 'Lesson 1: The "Good" Problem (Traffic Spikes)' weight: 1 summary: "Refactoring from Monolith to Microservices when necessary."

Lesson 1: The "Good" Problem

You have too many users. Your single database is melting. It's time to scale.

The Bottleneck

Our monitoring (Module 5) shows that Inventory checks are 80% of the database load.

The Refactor: Splitting the Monolith

We decide to extract Inventory into its own microservice with its own database.

Step 1: Update the Architecture

We change Inventory from a logical domain inside the monolith to a physical system.

// Before
domain Inventory { ... }

// After
system = kind "System"
container = kind "Container"

InventoryService = system "Inventory Microservice" {
API = container "Inventory API"
Database = container "Inventory DB"
}

Step 2: Update the Architecture

The OrderService can no longer call Inventory functions directly. It must make a gRPC call. Update your OpenAPI specs to reflect the new gRPC interfaces.

import { * } from 'sruja.ai/stdlib'


OrderService = system "Order Service" {
// ...
OrderService -> InventoryService "gRPC CheckStock"
}

Why Sruja helps

Refactoring is dangerous. Sruja helps you visualize the impact of the change before you write code. You can see exactly which other systems depend on Inventory and ensure you don't break them.

Lesson 2


title: "Lesson 2: Managing Technical Debt" weight: 2 summary: "Using Deprecation and ADRs to manage legacy code."

Lesson 2: Managing Technical Debt

Every codebase has skeletons. The key is to label them.

Deprecating Components

We decided to move from Stripe to Adyen for lower fees. But we can't switch overnight.

import { * } from 'sruja.ai/stdlib'


Stripe = system "Legacy Payment Gateway" {
metadata {
  tags ["deprecated"]
}
description "Do not use for new features. Migration in progress."
}

Adyen = system "New Payment Gateway" {
metadata {
  tags ["preferred"]
}
}

Governance Policies

We can enforce this with a policy!

// EXPECTED_FAILURE: Policy rules not yet implemented - rule keyword is deferred feature
policy Migration "No New Stripe Integrations" {
    rule "BanStripe" {
        // Pseudo-code: Fail if any NEW relation points to Stripe
        check "relation.to != 'Stripe'"
    }
}

This prevents developers from accidentally adding dependencies to the system you are trying to kill.

Governance


title: "Lesson 1: Security by Design" weight: 1 summary: "Modeling security standards with tags and metadata, validated in CI."

Lesson 1: Security by Design

Security isn't something you "add on" at the end. It must be baked into the architecture.

The Requirement

GDPR Article 32: Personal data must be encrypted.

Modeling Security Signals

Use tags and metadata to make security posture explicit.

import { * } from 'sruja.ai/stdlib'


Shop = system "Shop" {
  UserDB = datastore "User DB" {
    tags ["pii", "encrypted"]
    metadata {
      retention "90d"
    }
  }
}

view index {
include *
}

Validating in CI

Run sruja validate in CI to enforce architectural rules (unique IDs, valid references, layering, external boundary checks). Combine with linters to flag missing tags for sensitive resources. This is Compliance as Code.

Lesson 2


title: "Lesson 2: Cost Optimization" weight: 2 summary: "Tracking and controlling infrastructure costs."

Lesson 2: Cost Optimization

Cloud bills kill startups. Sruja helps you visualize where the money is going.

Modeling Cost

We can add cost metadata to our deployment nodes.

deployment Production {
    node DB "RDS Large" {
        metadata {
            cost "$500/month"
            type "db.r5.large"
        }
    }
}

Cost Policies

Use metadata and CI checks to prevent expensive mistakes in non‑production environments.

deployment Dev {
    node App "Small Instance" {
        metadata {
            cost "$20/month"
            type "t3.small"
        }
    }
}

Add a CI rule to flag dev nodes exceeding budget thresholds.

Lesson 3


title: "Lesson 3: Audit Readiness (SOC 2)" weight: 3 summary: "Using Sruja as evidence for auditors."

Lesson 3: Audit Readiness (SOC 2)

Congratulations! You are now big enough to need a SOC 2 audit. The auditor asks: "Show me your system diagram and prove that all changes are reviewed."

Sruja as Evidence

Instead of scrambling to draw a whiteboard diagram, you point them to your Sruja repository.

  1. Current State: The generated diagram is always up to date.
  2. Change History: Git history shows every architectural change, who made it, and who approved it (via Pull Request).
  3. Controls: Your validation and CI checks prove that you have automated controls for security and compliance.

The End

You have built a scalable, secure, and compliant e-commerce platform. And you did it with a blueprint.

Happy Architecting!

Production Architecture


title: "System Design Interview Mastery" weight: 0 summary: "Ace your system design interviews with real-world scenarios and Sruja modeling."

System Design Interview Mastery

Crack your next system design interview. This course teaches you how to approach, design, and model systems that impress interviewers at FAANG companies and top tech firms.

Why This Course?

System design interviews are the most important part of senior engineering interviews. This course gives you:

  • Real interview questions from top companies
  • Step-by-step approach to tackle any design question
  • Sruja modeling to visualize your designs
  • Best practices that interviewers look for
  • Common pitfalls to avoid

What You'll Learn

  • Interview Strategy: How to approach system design questions systematically
  • Scaling & Performance: Handle "design for 1M users" questions confidently
  • Architecture Patterns: Microservices, caching, load balancing, and more
  • Trade-offs: Make informed decisions and explain them clearly
  • Modeling with Sruja: Use Sruja to visualize and communicate your designs

Who This Course Is For

  • Software engineers preparing for senior/staff level interviews
  • Candidates targeting FAANG and top tech companies
  • Engineers who want to improve their system design skills
  • Anyone preparing for architecture/design interviews

Course Structure

Module 1: Performance & Scalability Interview Questions

Master the most common interview questions about scaling and performance.

Interview Questions Covered:

  • "Design a video streaming platform like YouTube"
  • "How would you handle 10M concurrent users?"
  • "Design a system with < 200ms latency"

Module 2: Modular Architecture & Microservices

Tackle complex system design questions requiring distributed systems knowledge.

Interview Questions Covered:

  • "Design an e-commerce platform"
  • "Design a ride-sharing service like Uber"
  • "Design a social media feed"

Module 3: Governance & Policies (Senior/Staff Level)

Answer questions about compliance, governance, and architectural standards.

Interview Questions Covered:

  • "How do you ensure compliance (HIPAA, SOC 2)?"
  • "How do you enforce architectural standards?"
  • "Design a system that must comply with regulations"

Prerequisites

  • Completed System Design 101 or equivalent
  • Familiarity with basic Sruja syntax
  • Understanding of basic system design concepts

Estimated Time

4-5 hours (includes practice exercises)

Interview Success Framework

Each module follows this proven approach:

  1. Understand the Question - Clarify requirements and scope
  2. Design the System - High-level architecture first
  3. Model with Sruja - Visualize your design
  4. Deep Dive - Discuss scaling, trade-offs, and edge cases
  5. Optimize - Improve based on feedback

Learning Outcomes

By the end of this course, you'll be able to:

  • ✅ Approach any system design question with confidence
  • ✅ Design scalable systems that handle millions of users
  • ✅ Explain trade-offs and make informed decisions
  • ✅ Use Sruja to communicate your designs clearly
  • ✅ Avoid common interview mistakes
  • ✅ Impress interviewers with production-ready thinking

Real Interview Questions You'll Master

  • Design a URL shortener (bit.ly)
  • Design a video streaming service (YouTube/Netflix)
  • Design a ride-sharing service (Uber/Lyft)
  • Design a social media feed (Twitter/Instagram)
  • Design a chat application (WhatsApp/Slack)
  • Design a search engine (Google)
  • Design a payment system (Stripe)
  • Design a distributed cache (Redis)

Ready to ace your next interview? Let's get started! 🎯

Performance


title: "Module Overview: Scaling & Performance Interview Questions" weight: 0 summary: "Master the most common system design interview questions about scaling and performance."

Module Overview: Scaling & Performance Interview Questions

"Design a system that handles 10 million concurrent users."

This is one of the most common system design interview questions. In this module, you'll learn how to answer scaling and performance questions confidently.

Interview Questions You'll Master

  • "Design a video streaming platform like YouTube/Netflix"
  • "How would you design a system to handle 10M concurrent users?"
  • "Design a system with < 200ms latency"
  • "How do you ensure high availability (99.99%)?"

What Interviewers Look For

  • ✅ Understanding of horizontal vs vertical scaling
  • ✅ Ability to estimate capacity and scale
  • ✅ Knowledge of performance metrics (latency, throughput)
  • ✅ Trade-off analysis (cost vs performance)
  • ✅ Clear communication of your design

Goals

  • Answer scaling questions with confidence
  • Use scale blocks to model auto-scaling strategies
  • Define SLOs to show production-ready thinking
  • Explain trade-offs clearly to interviewers

Interview Framework

We'll follow this approach for each question:

  1. Clarify Requirements - Ask about scale, latency, availability
  2. Design High-Level - Start with core components
  3. Model with Sruja - Visualize your architecture
  4. Discuss Scaling - Show how it handles load
  5. Optimize - Improve based on constraints

Estimated Time

60-75 minutes (includes practice)

Checklist

  • Understand how to approach scaling questions
  • Model scaling strategies with Sruja
  • Define SLOs to show production thinking
  • Practice explaining trade-offs clearly

Lesson 1


title: "Lesson 1: Interview Question - Design a Video Streaming Platform" weight: 1 summary: "Master scaling questions by designing YouTube/Netflix-style systems."

Lesson 1: Interview Question - Design a Video Streaming Platform

The Interview Question

"Design a video streaming platform like YouTube or Netflix that can handle millions of concurrent viewers."

This is a classic system design interview question asked at Google, Netflix, and other top companies. Let's break it down step-by-step.

Step 1: Clarify Requirements (What Interviewers Want to Hear)

Before jumping into design, always clarify:

You should ask:

  • "What's the scale? How many concurrent viewers?"
  • "What's the latency requirement? How fast should videos start?"
  • "What types of videos? Short clips or full movies?"
  • "Do we need to support live streaming or just on-demand?"

Interviewer's typical answer:

  • "Let's say 10 million concurrent viewers"
  • "Videos should start within 2 seconds"
  • "Both short clips and full movies"
  • "Focus on on-demand for now"

Step 2: Design the High-Level Architecture

Start with the core components:

  1. Client (mobile app, web browser)
  2. CDN (Content Delivery Network) - serves videos
  3. Origin Server - stores original videos
  4. API Server - handles metadata, user requests
  5. Database - stores video metadata, user data

Step 3: Model with Sruja

Let's model this architecture:

import { * } from 'sruja.ai/stdlib'


Viewer = person "Video Viewer"

StreamingPlatform = system "Video Streaming Service" {
CDN = container "Content Delivery Network" {
  technology "Cloudflare, AWS CloudFront"
  description "Serves videos from edge locations worldwide"
}

OriginServer = container "Origin Server" {
  technology "S3, GCS"
  description "Stores original video files"
}

VideoAPI = container "Video API" {
  technology "Rust, gRPC"
  description "Handles video metadata, user requests"
}

TranscodingService = container "Video Transcoding" {
  technology "FFmpeg, Kubernetes"
  description "Converts videos to different formats/qualities"
}

VideoDB = database "Video Metadata Database" {
  technology "PostgreSQL"
}

UserDB = database "User Database" {
  technology "PostgreSQL"
}
}

Viewer -> StreamingPlatform.CDN "Streams video"
StreamingPlatform.CDN -> StreamingPlatform.OriginServer "Fetches on cache miss"
Viewer -> StreamingPlatform.VideoAPI "Requests video info"
StreamingPlatform.VideoAPI -> StreamingPlatform.VideoDB "Queries metadata"
StreamingPlatform.VideoAPI -> StreamingPlatform.UserDB "Queries user data"
StreamingPlatform.OriginServer -> StreamingPlatform.TranscodingService "Processes videos"

view index {
include *
}

Step 4: Address Scaling (The Key Part)

Interviewer will ask: "How does this handle 10 million concurrent viewers?"

This is where you show your scaling knowledge. Let's add scaling configuration:

import { * } from 'sruja.ai/stdlib'


Viewer = person "Video Viewer"

StreamingPlatform = system "Video Streaming Service" {
CDN = container "Content Delivery Network" {
  technology "Cloudflare, AWS CloudFront"
  // CDN scales automatically - no need to configure
  description "Serves videos from edge locations worldwide"
}

VideoAPI = container "Video API" {
  technology "Rust, gRPC"

  // This is what interviewers want to see!
  scale {
    min 10
    max 1000
    metric "cpu > 75% or requests_per_second > 10000"
  }

  description "Handles video metadata, user requests"
}

TranscodingService = container "Video Transcoding" {
  technology "FFmpeg, Kubernetes"

  scale {
    min 5
    max 100
    metric "queue_length > 50"
  }

  description "Converts videos to different formats/qualities"
}

VideoDB = database "Video Metadata Database" {
  technology "PostgreSQL"
  // Database scaling: read replicas
  description "Primary database with 5 read replicas for scaling reads"
}
}

view index {
include *
}

What Interviewers Look For

✅ Good Answer (What You Just Did)

  1. Clarified requirements before designing
  2. Started with high-level architecture
  3. Modeled with Sruja to visualize
  4. Addressed scaling with specific numbers
  5. Explained trade-offs (CDN vs origin server)

❌ Bad Answer (Common Mistakes)

  1. Jumping straight to code/implementation details
  2. Not asking clarifying questions
  3. Designing for small scale only
  4. Not mentioning CDN or caching
  5. Ignoring database scaling

Key Points to Mention in Interview

1. CDN for Video Delivery

Say: "We use a CDN to serve videos from edge locations close to users. This reduces latency and offloads traffic from origin servers."

2. Horizontal Scaling for API

Say: "The API server scales horizontally from 10 to 1000 instances based on CPU and request rate. This handles traffic spikes during peak hours."

3. Database Read Replicas

Say: "We use read replicas for the database to scale read operations. Writes go to primary, reads can go to any replica."

4. Caching Strategy

Say: "We cache frequently accessed video metadata in Redis to reduce database load."

Interview Practice: Add Caching

Interviewer might ask: "How do you reduce database load?"

Add caching to your design:

import { * } from 'sruja.ai/stdlib'


StreamingPlatform = system "Video Streaming Service" {
VideoAPI = container "Video API" {
  technology "Rust, gRPC"
  scale {
    min 10
    max 1000
    metric "cpu > 75%"
  }
}

VideoDB = database "Video Metadata Database" {
  technology "PostgreSQL"
}

Cache = database "Video Metadata Cache" {
  technology "Redis"
  description "Caches frequently accessed video metadata"
}
}

StreamingPlatform.VideoAPI -> StreamingPlatform.Cache "Reads metadata (cache hit)"
StreamingPlatform.VideoAPI -> StreamingPlatform.VideoDB "Reads metadata (cache miss)"
StreamingPlatform.VideoAPI -> StreamingPlatform.Cache "Writes to cache"

view index {
include *
}

Understanding Scale Block Fields

min - Minimum Replicas

Interview tip: "We keep at least 10 instances running to handle baseline traffic and provide fault tolerance."

max - Maximum Replicas

Interview tip: "We cap at 1000 instances to control costs. If we need more, we'd need to optimize the architecture first."

metric - Scaling Trigger

Interview tip: "We scale based on CPU usage and request rate. When CPU exceeds 75% or requests exceed 10k/sec, we add more instances."

Real Interview Example: Capacity Estimation

Interviewer: "How many API servers do you need for 10M concurrent users?"

Your answer:

  1. "Assume each user makes 1 request per minute = 10M requests/minute = ~167k requests/second"
  2. "Each API server handles ~1000 requests/second"
  3. "We need ~167 servers at peak"
  4. "With 2x headroom for spikes: ~350 servers"
  5. "Our scale block allows 10-1000, so we're covered"

Exercise: Practice This Question

Design a video streaming platform and be ready to explain:

  1. Why you chose CDN
  2. How scaling works
  3. Database scaling strategy
  4. Caching approach

Practice tip: Time yourself (30-40 minutes) and explain out loud as if in an interview.

Common Follow-Up Questions

Be prepared for:

  • "How do you handle video uploads?" (Add upload service, queue for processing)
  • "What about live streaming?" (Add live streaming infrastructure)
  • "How do you ensure availability?" (Add redundancy, health checks)
  • "What's the cost?" (Estimate based on scale)

Next Steps

In the next lesson, we'll learn about SLOs (Service Level Objectives) - another common interview topic about defining performance targets.

Lesson 2


title: "Lesson 2: Interview Question - Design a High-Performance Payment System" weight: 2 summary: "Answer SLO and performance questions with confidence."

Lesson 2: Interview Question - Design a High-Performance Payment System

The Interview Question

"Design a payment processing system that can handle 1 million transactions per second with 99.99% availability and < 100ms latency."

This question tests your understanding of:

  • Performance requirements (SLOs)
  • High availability
  • Low latency systems
  • Trade-offs between consistency and performance

Step 1: Clarify Requirements

You should ask:

  • "What's the transaction volume? Peak vs average?"
  • "What's the availability requirement? 99.9% or 99.99%?"
  • "What's the latency requirement? P95 or P99?"
  • "What about consistency? Do we need strong consistency?"

Interviewer's answer:

  • "1M transactions/second at peak"
  • "99.99% availability (four nines)"
  • "< 100ms p95 latency"
  • "Strong consistency required (it's money!)"

Step 2: Design with SLOs in Mind

This is where SLOs (Service Level Objectives) come in. Interviewers love when you think about measurable targets.

Let's model the payment system with explicit SLOs:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  technology "Rust, gRPC"

  // This shows production-ready thinking!
  slo {
    availability {
      target "99.99%"
      window "30 days"
      current "99.97%"
    }

    latency {
      p95 "100ms"
      p99 "250ms"
      window "7 days"
      current {
        p95 "85ms"
        p99 "200ms"
      }
    }

    errorRate {
      target "< 0.01%"
      window "30 days"
      current "0.008%"
    }

    throughput {
      target "1000000 txn/s"
      window "1 hour"
      current "950000 txn/s"
    }
  }

  scale {
    min 100
    max 10000
    metric "cpu > 70% or requests_per_second > 500000"
  }
}

FraudDetection = container "Fraud Detection" {
  technology "Python, ML"
  description "Real-time fraud detection"
}

PaymentDB = database "Payment Database" {
  technology "PostgreSQL"
  description "Primary database with 10 read replicas"
}

Cache = database "Payment Cache" {
  technology "Redis"
  description "Caches recent transactions"
}

PaymentQueue = queue "Payment Queue" {
  technology "Kafka"
  description "Async payment processing"
}
}

Stripe = system "Stripe Gateway" {
tags ["external"]
}

BankAPI = system "Bank API" {
tags ["external"]
}

PaymentService.PaymentAPI -> PaymentService.FraudDetection "Validates"
PaymentService.PaymentAPI -> PaymentService.Cache "Checks recent transactions"
PaymentService.PaymentAPI -> PaymentService.PaymentDB "Stores transaction"
PaymentService.PaymentAPI -> PaymentService.PaymentQueue "Enqueues for async processing"
PaymentService.PaymentAPI -> Stripe "Processes payment"
PaymentService.PaymentAPI -> BankAPI "Validates with bank"

view index {
include *
}

What Interviewers Look For

✅ Good Answer (What You Just Did)

  1. Defined SLOs explicitly - Shows you think about measurable targets
  2. Addressed all requirements - Availability, latency, throughput
  3. Explained trade-offs - Strong consistency vs performance
  4. Scalability - Showed how to handle 1M txn/s
  5. Redundancy - Multiple replicas, failover strategies

❌ Bad Answer (Common Mistakes)

  1. Not defining SLOs or performance targets
  2. Ignoring availability requirements
  3. Not explaining how to achieve 99.99% availability
  4. Not addressing consistency requirements
  5. No capacity estimation

Key Points to Mention in Interview

1. Availability (99.99% = Four Nines)

Say: "99.99% availability means 52.6 minutes of downtime per year. To achieve this, we need:

  • Multiple data centers (active-active)
  • Automatic failover
  • Health checks and monitoring
  • Database replication with automatic promotion"

2. Latency (< 100ms p95)

Say: "To achieve < 100ms latency, we:

  • Use in-memory cache (Redis) for hot data
  • Keep database queries simple and indexed
  • Use connection pooling
  • Minimize network hops
  • Consider async processing for non-critical paths"

3. Throughput (1M txn/s)

Say: "To handle 1M transactions/second:

  • Horizontal scaling: 100-10,000 API instances
  • Database sharding by transaction ID
  • Read replicas for scaling reads
  • Caching frequently accessed data
  • Async processing for non-critical operations"

4. Strong Consistency

Say: "Since this is financial data, we need strong consistency:

  • All writes go to primary database
  • Read replicas are eventually consistent (ok for reads)
  • Use distributed transactions for critical operations
  • Trade-off: Slightly higher latency for correctness"

Understanding SLO Types (Interview Context)

Availability SLO

Interviewer asks: "How do you ensure 99.99% availability?"

Your answer with SLO:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  slo {
    availability {
      target "99.99%"
      window "30 days"
      current "99.97%"
    }
  }
}
}

view index {
include *
}

Explain: "We target 99.99% (four nines), which allows 52.6 minutes downtime per year. Currently at 99.97%, so we're close but need to improve redundancy."

Latency SLO

Interviewer asks: "How fast should payments process?"

Your answer with SLO:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  slo {
    latency {
      p95 "100ms"
      p99 "250ms"
      window "7 days"
    }
  }
}
}

view index {
include *
}

Explain: "95% of payments complete in under 100ms, 99% in under 250ms. We use p95/p99 instead of average because they show real user experience - a few slow payments don't skew the metric."

Error Rate SLO

Interviewer asks: "What error rate is acceptable?"

Your answer with SLO:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  slo {
    errorRate {
      target "< 0.01%"
      window "30 days"
      current "0.008%"
    }
  }
}
}

view index {
include *
}

Explain: "We target less than 0.01% error rate. Currently at 0.008%, which is good, but we monitor closely because payment errors are critical."

Real Interview Example: Capacity Estimation

Interviewer: "How many servers do you need for 1M txn/s?"

Your answer:

  1. "Each transaction requires ~10ms processing = 100 transactions/second per server"
  2. "1M txn/s ÷ 100 = 10,000 servers needed"
  3. "With 2x headroom for spikes and redundancy: ~20,000 servers"
  4. "But we can optimize:
    • Caching reduces DB load → fewer DB servers
    • Async processing → can batch operations
    • Database sharding → distributes load
    • Final estimate: ~5,000-10,000 servers"

Interview Practice: Add High Availability

Interviewer: "How do you ensure 99.99% availability?"

Add redundancy to your design:

import { * } from 'sruja.ai/stdlib'


PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  technology "Rust, gRPC"
  scale {
    min 100
    max 10000
    metric "cpu > 70%"
  }
  description "Deployed across 3 data centers (active-active)"
}

PaymentDB = database "Payment Database" {
  technology "PostgreSQL"
  description "Primary in US-East, replicas in US-West and EU"
}
}

// Show redundancy
PaymentService.PaymentAPI -> PaymentService.PaymentDB "Writes to primary"

view index {
include *
}

Explain: "We deploy across 3 data centers in active-active mode. If one fails, traffic automatically routes to others. Database has primary + replicas with automatic failover."

Common Follow-Up Questions

Be prepared for:

  1. "What if the database fails?"

    • Answer: "Automatic failover to replica, data replication with < 1s lag"
  2. "How do you handle network partitions?"

    • Answer: "CAP theorem - we choose consistency over availability for payments. If partition occurs, we reject transactions rather than risk inconsistency."
  3. "What about data consistency across regions?"

    • Answer: "Synchronous replication for critical data, eventual consistency for non-critical. Use distributed transactions for cross-region operations."
  4. "How do you monitor SLOs?"

    • Answer: "Real-time dashboards showing current vs target SLOs. Alerts when we're at risk of violating SLOs. Weekly reviews of SLO performance."

Exercise: Practice This Question

Design a payment system and be ready to explain:

  1. How you achieve 99.99% availability
  2. How you keep latency < 100ms
  3. How you handle 1M txn/s
  4. Your SLO targets and how you measure them

Practice tip: Time yourself (40-45 minutes) and explain out loud. Focus on SLOs - interviewers love this!

Key Takeaways for Interviews

  1. Always define SLOs - Shows production-ready thinking
  2. Explain trade-offs - Availability vs consistency, latency vs throughput
  3. Show capacity estimation - Back up your numbers
  4. Mention monitoring - How you track SLOs
  5. Discuss failure scenarios - What happens when things break

Next Steps

You've learned how to handle performance and SLO questions. In the next module, we'll tackle modular architecture questions - another common interview topic!

Lesson 3


title: "Lesson 3: SLO Enforcement in Practice" weight: 3 summary: "Use SLOs to drive scaling, alerting, and design trade‑offs."

Lesson 3: SLO Enforcement in Practice

SLO‑Driven Operations

Use SLOs to set thresholds for alerts and capacity changes.

Sruja: Model SLOs & Validate

import { * } from 'sruja.ai/stdlib'


API = system "API Server" {
  Gateway = container "Gateway" {
    scale { metric "req/s" min 500 max 5000 }
  }
}

slo {
availability { target "99.95%" window "30 days" }
latency { p95 "150ms" window "7 days" }
errorRate { target "< 0.05%" window "30 days" }
throughput { target "3000 req/s" window "1 hour" }
}

view index {
include *
}
sruja lint payments.sruja

Practice

  • Define SLOs for your critical path; ensure scale bounds meet throughput.
  • Set alert thresholds aligned to SLO windows.

Modular


title: "Module Overview: Microservices & Distributed Systems Interview Questions" weight: 0 summary: "Master microservices and distributed system design interview questions."

Module Overview: Microservices & Distributed Systems Interview Questions

"Design an e-commerce platform using microservices."

This is a very common interview question that tests your understanding of:

  • Microservices architecture
  • Service decomposition
  • Inter-service communication
  • Distributed system challenges

Interview Questions You'll Master

  • "Design an e-commerce platform (Amazon-style)"
  • "Design a ride-sharing service like Uber"
  • "Design a social media platform"
  • "How do you split a monolith into microservices?"

What Interviewers Look For

  • ✅ Understanding of microservices vs monolith trade-offs
  • ✅ Ability to decompose a system into services
  • ✅ Knowledge of service communication patterns
  • ✅ Understanding of distributed system challenges
  • ✅ Clear communication of design decisions

Goals

  • Answer microservices questions confidently
  • Model service boundaries using separate systems in Sruja
  • Explain service decomposition strategy
  • Discuss trade-offs and challenges

Interview Framework

We'll follow this approach:

  1. Clarify Requirements - Scale, features, constraints
  2. Identify Services - How to decompose the system
  3. Model with Sruja - Use separate systems to show service boundaries
  4. Discuss Communication - APIs, events, data flow
  5. Address Challenges - Consistency, failures, monitoring

Estimated Time

60-75 minutes (includes practice)

Checklist

  • Understand microservices decomposition
  • Model services as separate systems in Sruja
  • Explain service communication patterns
  • Discuss distributed system challenges

Lesson 1


title: "Lesson 1: Interview Question - Design an E-Commerce Platform (Microservices)" weight: 1 summary: "Master microservices questions by designing Amazon-style platforms."

Lesson 1: Interview Question - Design an E-Commerce Platform (Microservices)

The Interview Question

"Design an e-commerce platform like Amazon that can handle millions of users and products. Use a microservices architecture."

This is one of the most common system design interview questions. It tests:

  • System decomposition into microservices
  • Service boundaries and responsibilities
  • Inter-service communication
  • Data consistency across services

Step 1: Clarify Requirements

You should ask:

  • "What are the core features? Shopping cart, checkout, recommendations?"
  • "What's the scale? Users, products, orders per day?"
  • "What about inventory? Real-time stock management?"
  • "Payment processing? Do we integrate with payment gateways?"

Interviewer's typical answer:

  • "Core features: Product catalog, shopping cart, checkout, order management, user accounts"
  • "Scale: 100M users, 1B products, 10M orders/day"
  • "Real-time inventory tracking required"
  • "Integrate with payment gateways like Stripe"

Step 2: Identify Microservices

Key insight: Break down by business domain, not technical layers.

You should identify:

  1. User Service - Authentication, profiles
  2. Product Service - Catalog, search, recommendations
  3. Cart Service - Shopping cart management
  4. Order Service - Order processing, tracking
  5. Payment Service - Payment processing
  6. Inventory Service - Stock management
  7. Notification Service - Emails, SMS

Step 3: Model with Sruja (Separate Systems)

Model each microservice as a separate system within the architecture. This clearly shows service boundaries.

import { * } from 'sruja.ai/stdlib'


Customer = person "Online Customer"

// Each microservice is a separate system
UserService = system "User Management" {
AuthAPI = container "Authentication API" {
  technology "Rust, gRPC"
}

ProfileAPI = container "Profile API" {
  technology "Rust, gRPC"
}

UserDB = database "User Database" {
  technology "PostgreSQL"
}
}

ProductService = system "Product Catalog" {
ProductAPI = container "Product API" {
  technology "Java, Spring Boot"
}

SearchAPI = container "Search API" {
  technology "Elasticsearch"
}

RecommendationAPI = container "Recommendation API" {
  technology "Python, ML"
}

ProductDB = database "Product Database" {
  technology "PostgreSQL"
}

SearchIndex = database "Search Index" {
  technology "Elasticsearch"
}
}

CartService = system "Shopping Cart" {
CartAPI = container "Cart API" {
  technology "Node.js, Express"
}

CartDB = database "Cart Database" {
  technology "Redis"
  description "In-memory cache for fast cart operations"
}
}

OrderService = system "Order Management" {
OrderAPI = container "Order API" {
  technology "Node.js, Express"
}

OrderProcessor = container "Order Processor" {
  technology "Node.js"
}

OrderDB = database "Order Database" {
  technology "PostgreSQL"
}

OrderQueue = queue "Order Queue" {
  technology "Kafka"
}
}

PaymentService = system "Payment Processing" {
PaymentAPI = container "Payment API" {
  technology "Rust, gRPC"
}

PaymentDB = database "Payment Database" {
  technology "PostgreSQL"
}
}

InventoryService = system "Inventory Management" {
InventoryAPI = container "Inventory API" {
  technology "Java, Spring Boot"
}

InventoryDB = database "Inventory Database" {
  technology "PostgreSQL"
}
}

NotificationService = system "Notifications" {
NotificationAPI = container "Notification API" {
  technology "Python, FastAPI"
}

EmailQueue = queue "Email Queue" {
  technology "RabbitMQ"
}

SMSQueue = queue "SMS Queue" {
  technology "RabbitMQ"
}
}

// API Gateway - single entry point
ECommerceApp = system "E-Commerce Application" {
WebApp = container "Web Application" {
  technology "React, Next.js"
}

APIGateway = container "API Gateway" {
  technology "Kong, Nginx"
  description "Routes requests to appropriate microservices"
}
}

Stripe = system "Stripe Gateway" {
tags ["external"]
}

PayPal = system "PayPal Gateway" {
tags ["external"]
}

// User flow
Customer -> ECommerceApp.WebApp "Browses products"
ECommerceApp.WebApp -> ECommerceApp.APIGateway "Makes requests"
ECommerceApp.APIGateway -> UserService.AuthAPI "Authenticates"
ECommerceApp.APIGateway -> ProductService.ProductAPI "Fetches products"
ECommerceApp.APIGateway -> ProductService.SearchAPI "Searches products"
ECommerceApp.APIGateway -> ProductService.RecommendationAPI "Gets recommendations"

// Cart flow
ECommerceApp.APIGateway -> CartService.CartAPI "Manages cart"
CartService.CartAPI -> CartService.CartDB "Stores cart"

// Order flow
ECommerceApp.APIGateway -> OrderService.OrderAPI "Creates order"
OrderService.OrderAPI -> InventoryService.InventoryAPI "Checks stock"
OrderService.OrderAPI -> PaymentService.PaymentAPI "Processes payment"
OrderService.OrderAPI -> UserService.ProfileAPI "Gets user info"
OrderService.OrderAPI -> OrderService.OrderQueue "Enqueues for processing"
OrderService.OrderProcessor -> OrderService.OrderQueue "Processes orders"
OrderService.OrderProcessor -> NotificationService.NotificationAPI "Sends confirmation"

// Payment flow
PaymentService.PaymentAPI -> PaymentService.PaymentDB "Stores transaction"
PaymentService.PaymentAPI -> Stripe "Processes cards"
PaymentService.PaymentAPI -> PayPal "Processes PayPal"

// Notification flow
NotificationService.NotificationAPI -> NotificationService.EmailQueue "Sends emails"
NotificationService.NotificationAPI -> NotificationService.SMSQueue "Sends SMS"

view index {
include *
}

What Interviewers Look For

✅ Good Answer (What You Just Did)

  1. Clear service boundaries - Each service is a separate system
  2. Single responsibility - Each service has one clear purpose
  3. Identified communication patterns - API calls, queues, events
  4. Addressed data ownership - Each service owns its database
  5. Explained trade-offs - Why microservices vs monolith

❌ Bad Answer (Common Mistakes)

  1. Services too granular (one service per function)
  2. Services too coarse (monolith split incorrectly)
  3. Not showing service boundaries clearly
  4. Ignoring data consistency challenges
  5. No API gateway or service mesh

Key Points to Mention in Interview

1. Service Decomposition Strategy

Say: "I decompose by business domain, not technical layers. Each service owns its data and has clear boundaries. For example:

  • User Service owns user data and authentication
  • Product Service owns product catalog and search
  • Order Service owns order lifecycle
  • Each service is a separate system in the architecture"

2. Inter-Service Communication

Say: "Services communicate via:

  • Synchronous: REST/gRPC for real-time operations (checkout, cart)
  • Asynchronous: Message queues for eventual consistency (order processing, notifications)
  • API Gateway: Single entry point, handles routing, auth, rate limiting"

3. Data Consistency

Say: "Each service owns its database (database per service pattern). For cross-service operations:

  • Saga pattern: For distributed transactions (order → payment → inventory)
  • Eventual consistency: Acceptable for non-critical paths (notifications)
  • Strong consistency: Only within a service (cart operations)"

4. API Gateway Pattern

Say: "API Gateway provides:

  • Single entry point for all client requests
  • Request routing to appropriate microservices
  • Authentication/authorization - validate tokens once
  • Rate limiting and throttling
  • Load balancing across service instances"

Interview Practice: Add More Services

Interviewer might ask: "What about recommendations and analytics?"

Add them to your design (extending the main architecture):

import { * } from 'sruja.ai/stdlib'


Customer = person "Online Customer"

// Existing services (UserService, ProductService, OrderService, etc. from main design)
ProductService = system "Product Catalog" {
ProductAPI = container "Product API" {
  technology "Java, Spring Boot"
}
}

OrderService = system "Order Management" {
OrderAPI = container "Order API" {
  technology "Node.js, Express"
}
}

ECommerceApp = system "E-Commerce Application" {
APIGateway = container "API Gateway" {
  technology "Kong, Nginx"
}
}

// Additional services
RecommendationService = system "Recommendations" {
RecommendationAPI = container "Recommendation API" {
  technology "Python, ML"
}

UserBehaviorDB = database "User Behavior Database" {
  technology "MongoDB"
  description "Stores user clicks, views, purchases for ML"
}
}

AnalyticsService = system "Analytics" {
AnalyticsAPI = container "Analytics API" {
  technology "Rust"
}

AnalyticsDB = database "Analytics Database" {
  technology "ClickHouse"
  description "Time-series data for analytics"
}
}

// Show how services interact
ECommerceApp.APIGateway -> ProductService.ProductAPI "Gets products"
ECommerceApp.APIGateway -> RecommendationService.RecommendationAPI "Gets recommendations"
OrderService.OrderAPI -> AnalyticsService.AnalyticsAPI "Tracks order events"

view index {
include *
}

Common Follow-Up Questions

Be prepared for:

  1. "How do you handle failures?"

    • Answer: "Circuit breakers prevent cascading failures. Retries with exponential backoff. Fallbacks (show cached data if service down). If payment service is down, queue the order for later processing."
  2. "How do you ensure data consistency?"

    • Answer: "Saga pattern for distributed transactions. Each step can be compensated if later steps fail. For example, if payment fails after inventory is reserved, we release the inventory (compensating transaction)."
  3. "How do you handle service versioning?"

    • Answer: "API versioning in URLs (/v1/, /v2/). Deploy new versions alongside old ones. Gradually migrate traffic. Deprecate old versions after migration."
  4. "How do you monitor microservices?"

    • Answer: "Distributed tracing (Jaeger, Zipkin) to track requests across services. Centralized logging (ELK stack). Metrics (Prometheus) per service. Health checks for each service."
  5. "How do you handle service discovery?"

    • Answer: "Service registry (Consul, Eureka) or DNS-based discovery. API Gateway can handle routing. Service mesh (Istio) for advanced features like load balancing, retries."

Exercise: Practice This Question

Design an e-commerce platform and be ready to explain:

  1. How you decomposed into services (why these services?)
  2. How services communicate (sync vs async)
  3. How you handle data consistency
  4. How you handle failures
  5. Your scaling strategy for each service

Practice tip: Time yourself (45-50 minutes). Draw the architecture, then model it with Sruja. Explain your decisions out loud as if in an interview.

Key Takeaways for Interviews

  1. Decompose by business domain - Not technical layers
  2. Each service is a separate system - Clear boundaries in Sruja
  3. Each service owns its data - Database per service
  4. Use API Gateway - Single entry point
  5. Mix sync and async - REST for real-time, queues for async
  6. Address failures - Circuit breakers, retries, fallbacks
  7. Show with separate systems - Clear service boundaries in architecture

Next Steps

You've learned how to design microservices architectures. In the next module, we'll cover governance and policies - important for senior/staff level interviews!

Governance


title: "Module Overview: Senior/Staff Level Interview Questions" weight: 0 summary: "Answer governance, compliance, and architecture leadership questions."

Module Overview: Senior/Staff Level Interview Questions

"How do you ensure architectural standards across a large organization?"

This module covers questions typically asked in senior/staff engineer interviews. These test your ability to:

  • Lead architecture decisions
  • Enforce standards and best practices
  • Handle compliance and regulatory requirements
  • Design for large-scale organizations

Interview Questions You'll Master

  • "How do you enforce architectural standards?"
  • "Design a system that must comply with HIPAA/SOC 2"
  • "How do you ensure security across microservices?"
  • "How do you handle compliance in a distributed system?"

What Interviewers Look For

  • ✅ Understanding of governance and policies
  • ✅ Ability to enforce standards at scale
  • ✅ Knowledge of compliance requirements
  • ✅ Leadership and architectural thinking
  • ✅ Trade-offs between flexibility and standards

Goals

  • Answer governance questions confidently
  • Define policies with Sruja
  • Explain compliance requirements
  • Discuss enforcement strategies

Interview Framework

We'll follow this approach:

  1. Understand Requirements - Compliance, standards, scale
  2. Define Policies - Security, compliance, best practices
  3. Model with Sruja - Show policies in architecture
  4. Discuss Enforcement - How to ensure compliance
  5. Address Trade-offs - Flexibility vs standards

Estimated Time

45-60 minutes

Checklist

  • Understand policy syntax and usage
  • Define security and compliance policies
  • Model policies with Sruja
  • Explain enforcement strategies

Lesson 1


title: "Lesson 1: Interview Question - Design a HIPAA-Compliant Healthcare System" weight: 1 summary: "Answer compliance and governance questions for senior-level interviews."

Lesson 1: Interview Question - Design a HIPAA-Compliant Healthcare System

The Interview Question

"Design a healthcare platform that stores patient data and must comply with HIPAA regulations. How do you ensure compliance across all services?"

This is a senior/staff level interview question that tests:

  • Understanding of compliance requirements
  • Ability to enforce standards at scale
  • Security and privacy considerations
  • Governance and policy enforcement

Step 1: Clarify Requirements

You should ask:

  • "What are the core features? Patient records, appointments, prescriptions?"
  • "What's the scale? How many patients, healthcare providers?"
  • "What compliance requirements? HIPAA, SOC 2, others?"
  • "What about data retention? How long must we keep records?"

Interviewer's answer:

  • "Core: Patient records, appointments, prescriptions, billing"
  • "Scale: 10M patients, 100K healthcare providers"
  • "Must comply with HIPAA (health data privacy)"
  • "Retain records for 10 years (legal requirement)"

Step 2: Understand HIPAA Requirements

Key HIPAA requirements (you should mention these):

  1. Encryption: Data at rest and in transit
  2. Access Control: Role-based access, audit logs
  3. Audit Logging: Track all access to patient data
  4. Data Minimization: Only collect necessary data
  5. Breach Notification: Report breaches within 72 hours

Step 3: Design with Policies

This is where Sruja's policy feature is perfect! Show how you enforce compliance:

import { * } from 'sruja.ai/stdlib'


// HIPAA Compliance Policy
HIPAACompliance = policy "All patient data must be encrypted and access logged" {
  category "compliance"
  enforcement "required"
  description "HIPAA requires encryption at rest and in transit, plus audit logging for all patient data access"
}

// Security Policy
TLSEnforcement = policy "All external communications must use TLS 1.3" {
  category "security"
  enforcement "required"
  description "Required for HIPAA compliance - all data in transit must be encrypted"
}

EncryptionAtRest = policy "All patient data must be encrypted at rest using AES-256" {
  category "security"
  enforcement "required"
  description "HIPAA requirement - database encryption, file encryption"
}

// Access Control Policy
AccessControl = policy "Role-based access control required for all patient data" {
  category "security"
  enforcement "required"
  description "Only authorized healthcare providers can access patient data"
}

// Audit Logging Policy
AuditLogging = policy "All access to patient data must be logged" {
  category "compliance"
  enforcement "required"
  description "HIPAA requires audit trails - who accessed what, when, why"
}

// Observability Policy
Observability = policy "All services must expose health check and metrics endpoints" {
  category "observability"
  enforcement "required"
  metadata {
    healthEndpoint "/health"
    metricsEndpoint "/metrics"
  }
}

HealthcareApp = system "Healthcare Application" {
PatientAPI = container "Patient API" {
  technology "Rust, gRPC"
  tags ["encrypted", "audit-logged"]
  description "Handles patient data - must comply with HIPAACompliance policy"
}

AppointmentAPI = container "Appointment API" {
  technology "Java, Spring Boot"
  tags ["encrypted"]
  description "Manages appointments - must comply with all policies"
}

BillingAPI = container "Billing API" {
  technology "Node.js, Express"
  tags ["encrypted", "audit-logged"]
  description "Handles billing - contains PHI (Protected Health Information)"
}

PatientDB = database "Patient Database" {
  technology "PostgreSQL"
  tags ["encrypted", "audit-logged"]
  description "Encrypted at rest, all access logged for HIPAA compliance"
}

AuditLogDB = database "Audit Log Database" {
  technology "PostgreSQL"
  description "Stores audit logs - immutable, append-only"
}

AuditQueue = queue "Audit Log Queue" {
  technology "Kafka"
  description "Async audit logging to avoid blocking operations"
}
}

IdentityProvider = system "Identity Provider" {
tags ["external"]
description "OAuth2/OIDC for authentication and authorization"
}

// Show compliance in action
HealthcareApp.PatientAPI -> HealthcareApp.PatientDB "Reads/Writes (encrypted, logged)"
HealthcareApp.PatientAPI -> HealthcareApp.AuditLogDB "Logs access via AuditQueue"
HealthcareApp.PatientAPI -> IdentityProvider "Validates access tokens"

view index {
include *
}

What Interviewers Look For

✅ Good Answer (What You Just Did)

  1. Understood compliance requirements - Mentioned specific HIPAA rules
  2. Defined policies explicitly - Showed governance thinking
  3. Applied policies to architecture - Tags, descriptions show compliance
  4. Addressed security - Encryption, access control, audit logging
  5. Explained enforcement - How policies are enforced

❌ Bad Answer (Common Mistakes)

  1. Not understanding compliance requirements
  2. No mention of policies or governance
  3. Ignoring security (encryption, access control)
  4. No audit logging strategy
  5. Can't explain how to enforce standards

Key Points to Mention in Interview

1. Policy-Driven Architecture

Say: "I define policies at the architecture level to enforce standards. For example:

  • HIPAACompliance policy requires encryption and audit logging
  • All services that handle patient data must comply
  • Policies are checked in CI/CD - non-compliant services can't deploy"

2. Encryption Strategy

Say: "We encrypt data at multiple levels:

  • In transit: TLS 1.3 for all communications
  • At rest: AES-256 encryption for databases
  • Application level: Encrypt sensitive fields before storing"

3. Access Control

Say: "We use:

  • OAuth2/OIDC: For authentication and authorization
  • Role-based access control (RBAC): Doctors can access their patients, admins have broader access
  • Principle of least privilege: Users only get minimum required access"

4. Audit Logging

Say: "We log all access to patient data:

  • What: Which patient record was accessed
  • Who: Which user/role accessed it
  • When: Timestamp
  • Why: Purpose of access (treatment, billing, etc.)
  • Immutable logs: Can't be modified or deleted"

5. Enforcement Strategy

Say: "We enforce policies through:

  • CI/CD checks: Validate architecture before deployment
  • Service mesh policies: Enforce TLS, rate limiting
  • Database policies: Encryption at rest, access controls
  • Monitoring: Alert on policy violations"

Interview Practice: Add More Compliance

Interviewer might ask: "What about data retention and deletion?"

Add data retention policy:

import { * } from 'sruja.ai/stdlib'


HIPAACompliance = policy "All patient data must be encrypted and access logged" {
  category "compliance"
  enforcement "required"
}

DataRetention = policy "Patient records retained for 10 years, then archived" {
  category "compliance"
  enforcement "required"
  description "Legal requirement - records must be retained for 10 years, then moved to cold storage"
}

RightToDeletion = policy "Support patient right to data deletion (with exceptions)" {
  category "compliance"
  enforcement "required"
  description "GDPR/HIPAA - patients can request data deletion, but some data must be retained for legal reasons"
}

HealthcareApp = system "Healthcare Application" {
  PatientAPI = container "Patient API" {
    technology "Rust, gRPC"
    tags ["encrypted", "audit-logged"]
  }

  PatientDB = database "Patient Database" {
    technology "PostgreSQL"
    description "Active patient records - 10 year retention"
  }

  ArchiveDB = database "Archive Database" {
    technology "S3 Glacier"
    description "Cold storage for records older than 10 years"
  }
}

view index {
include *
}

Common Follow-Up Questions

Be prepared for:

  1. "How do you ensure all services comply?"

    • Answer: "Policy validation in CI/CD. Architecture review process. Service mesh enforces some policies automatically. Regular audits."
  2. "What if a service violates a policy?"

    • Answer: "CI/CD blocks deployment. Alert security team. Architecture review required. Service owner must fix before deploying."
  3. "How do you handle breaches?"

    • Answer: "Automated breach detection via monitoring. Incident response plan. HIPAA requires notification within 72 hours. Audit logs help identify scope."
  4. "How do you balance compliance with developer productivity?"

    • Answer: "Automate compliance checks. Provide templates and libraries. Make compliance easy, not burdensome. Clear documentation and examples."

Exercise: Practice This Question

Design a HIPAA-compliant healthcare system and be ready to explain:

  1. How you enforce HIPAA requirements
  2. Your encryption strategy
  3. Your access control approach
  4. Your audit logging implementation
  5. How you ensure compliance across services

Practice tip: This is a senior-level question. Focus on:

  • Governance and policies
  • Security and compliance
  • Enforcement strategies
  • Trade-offs and practical considerations

Key Takeaways for Senior Interviews

  1. Understand compliance requirements - Know HIPAA, SOC 2, GDPR basics
  2. Define policies explicitly - Show governance thinking
  3. Enforce at multiple levels - CI/CD, service mesh, monitoring
  4. Balance compliance and productivity - Make it easy for developers
  5. Think about scale - How to enforce across 100+ services

Next Steps

You've learned how to handle compliance and governance questions. This completes the Production Architecture course! You're now ready to tackle:

  • ✅ Scaling and performance questions
  • ✅ Microservices architecture questions
  • ✅ Senior-level governance questions

Keep practicing with real interview questions! 🎯

Lesson 2


title: "Lesson 2: Policies, Constraints, Conventions" weight: 2 summary: "Codify guardrails and agreements; enforce consistency."

Lesson 2: Policies, Constraints, Conventions

Why Governance?

Governance ensures systems remain secure, maintainable, and consistent as they evolve.

Sruja: Codify Guardrails

import { * } from 'sruja.ai/stdlib'


SecurityPolicy = policy "Security Policy" {
description "Security posture for services"
}

constraints {
rule "No PII in logs"
rule "Only managed Postgres for relational data"
}

conventions {
naming "kebab-case for services"
tracing "W3C trace context propagated"
}

view index {
include *
}

Practice

  • Add a policy describing your security posture.
  • Capture 2–3 constraints and conventions used by your team.

Agentic AI


title: "Agentic AI with Sruja" summary: "Model agent systems, RAG pipelines, and governance with Sruja DSL." difficulty: "advanced" topic: "ai" description: "A practical course on architecting agentic AI systems using Sruja: foundations, RAG, orchestration, and production governance." estimatedTime: "3–4 hours"

Agentic AI with Sruja

Learn to design agent-based AI systems with clear boundaries, interfaces, and governance using Sruja DSL.

Fundamentals


title: "Fundamentals of Agentic AI" weight: 10 summary: "Understand the core concepts of AI agents, tools, and cognitive architectures." description: "This module covers the building blocks of Agentic AI systems, distinguishing them from traditional LLM chains." difficulty: "intermediate" topic: "agentic-ai" estimatedTime: "30 mins"

Fundamentals of Agentic AI

Welcome to the first module of the Agentic AI Architecture course. In this module, we will explore what makes an AI system "agentic" and how to model its components using Sruja.

Learning Objectives

By the end of this module, you will be able to:

  1. Define Agentic AI: Understand the difference between passive LLM calls and autonomous agents.
  2. Identify Core Components: Recognize Agents, Tools, Memory, and Planning modules.
  3. Model Basic Agents: Use Sruja to represent a simple agent with tools.

Lessons

  1. What is Agentic AI?
  2. Core Components

Lesson 1


title: "What is Agentic AI?" weight: 10 summary: "Defining Agentic AI and its shift from static chains to dynamic loops." difficulty: "intermediate" topic: "agentic-ai" estimatedTime: "10 mins"

What is Agentic AI?

Traditional LLM applications often follow a linear chain: Prompt -> LLM -> Output. Agentic AI breaks this linearity by introducing a control loop where the model decides what to do next.

The Control Loop

An agent typically operates in a loop:

  1. Observe: Read input or environment state.
  2. Reason: Decide on an action (using an LLM).
  3. Act: Execute the action (call a tool).
  4. Reflect: Observe the result of the action.
  5. Repeat: Continue until the goal is met.

Agent vs. Chain

FeatureChain (e.g., LangChain Runnable)Agent
Control FlowHardcoded by developerDetermined dynamically by LLM
FlexibilityRigid, predictableAdaptive, handles ambiguity
Failure RecoveryOften brittle (fails if one step fails)Can self-correct and retry
ComplexityLowerHigher (requires guardrails)

Why Sruja for Agents?

Modeling agents is complex because relationships are often dynamic. Sruja helps by:

  • Visualizing Dependencies: Showing which agents use which tools.
  • Defining Boundaries: separating the cognitive engine (LLM) from the execution layer (Tools).
  • Documenting Flows: Tracing the decision loop.
import { * } from 'sruja.ai/stdlib'


Agent = component "Research Agent"
LLM = component "Model Provider"
Tool = component "Search Tool"

Agent -> LLM "Reasons next step"
Agent -> Tool "Executes action"
Tool -> Agent "Returns observation"

view index {
include *
}

Lesson 2


title: "Core Components: Agents, Tools, Memory" weight: 20 summary: "Deep dive into the anatomy of an AI agent." difficulty: "intermediate" topic: "agentic-ai" estimatedTime: "15 mins"

Core Components

Every agentic system consists of a few fundamental building blocks.

1. The Agent (The Brain)

The core logic that orchestrates the workflow. It holds the "system prompt" or persona and manages the context window.

2. Tools (The Hands)

Capabilities exposed to the agent. These can be:

  • APIs: Weather, Stock Prices, Internal Databases.
  • Functions: Calculator, Code Interpreter.
  • Retrievers: RAG search against vector databases.

3. Memory (The Context)

  • Short-term Memory: The current conversation history and scratchpad of thoughts.
  • Long-term Memory: Vector databases or persistent storage for recalling past interactions.

Modeling in Sruja

We can map these components to Sruja elements:

  • Agent -> container or component
  • Tool -> component or external system
  • Memory -> datastore
import { * } from 'sruja.ai/stdlib'


AgentSystem = system "Customer Support Bot" {
Brain = container "Orchestrator" {
  description "Main control loop"
}

Memory = container "Context Store" {
  ShortTerm = component "Conversation History"
  LongTerm = component "Vector DB"
}

Tools = container "Toolbelt" {
  CRM = component "CRM Connector"
  KB = component "Knowledge Base"
}

Brain -> Tools.CRM "Look up user"
Brain -> Tools.KB "Search policy"
Brain -> Memory.ShortTerm "Read/Write context"
}

view index {
include *
}

Patterns


title: "Agentic Patterns" weight: 20 summary: "Common architectural patterns for building reliable agents." description: "Learn about ReAct, Plan-and-Solve, and Multi-Agent Orchestration." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "45 mins"

Agentic Patterns

Single-loop agents are powerful, but complex tasks often require structured patterns or multiple agents working together.

Learning Objectives

  1. Understand ReAct: The foundational pattern of Reason + Act.
  2. Explore Multi-Agent Systems: How agents collaborate.
  3. Model Orchestration: Supervisor vs. Hierarchical flows.

Lessons

  1. The ReAct Pattern
  2. Multi-Agent Orchestration

Lesson 1


title: "The ReAct Pattern" weight: 10 summary: "Modeling the Reason + Act loop." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "15 mins"

The ReAct Pattern

ReAct (Reasoning + Acting) is a prompting strategy where the model explicitly generates:

  1. Thought: Reasoning about the current state.
  2. Action: The tool call to make.
  3. Observation: The result of the tool call.

This loop continues until the agent decides it has enough information to answer.

Sruja Model

We can model this flow using a scenario or story in Sruja to visualize the sequence.

import { * } from 'sruja.ai/stdlib'


component Agent
component Tool
component User

story ReActLoop "Answering a Question" {
User -> Agent "Ask: What is the weather in SF?"

// Step 1
Agent -> Agent "Thought: I need to check weather"
Agent -> Tool "Action: WeatherAPI(SF)"
Tool -> Agent "Observation: 15°C, Cloudy"

// Step 2
Agent -> Agent "Thought: I have the answer"
Agent -> User "Answer: It's 15°C and cloudy."
}

view index {
include *
}

This visualization helps stakeholders understand the latency and cost implications of the multiple steps involved in a single user request.

Lesson 2


title: "Multi-Agent Orchestration" weight: 20 summary: "Supervisor, Hierarchical, and Mesh architectures." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "20 mins"

Multi-Agent Orchestration

For complex domains, a single agent can get confused. Multi-Agent Systems (MAS) split responsibilities among specialized agents.

Supervisor Pattern

A central "Supervisor" agent routes tasks to worker agents and aggregates results.

import { * } from 'sruja.ai/stdlib'


Supervisor = container "Orchestrator"

Coder = container "Coding Agent" {
description "Writes and executes code"
}

Writer = container "Documentation Agent" {
description "Writes summaries"
}

Supervisor -> Coder "Delegates coding tasks"
Supervisor -> Writer "Delegates writing tasks"
Coder -> Supervisor "Returns result"
Writer -> Supervisor "Returns result"

view index {
include *
}

Hierarchical Teams

Agents can manage other agents, forming a tree structure. This is useful for large-scale operations like software development (Manager -> Tech Lead -> Developer).

Network/Mesh

Agents communicate directly with each other without a central supervisor. This is more decentralized but harder to debug. Sruja's relationship visualization shines here by mapping the allowable communication paths.

Modeling


title: "Modeling Agents in Sruja" weight: 30 summary: "Best practices for documenting Agentic systems." description: "Learn how to use Sruja's features to effectively model AI architectures." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "30 mins"

Modeling Agents in Sruja

In this final module, we will bring everything together and learn the best practices for modeling agentic systems in Sruja.

Learning Objectives

  1. Granularity: Deciding when to model an agent as a System, Container, or Component.
  2. Metadata: Using tags to track models, costs, and latency.
  3. Governance: Defining policies for AI safety.

Lessons

  1. Modeling Strategies
  2. Governance and Safety

Lesson 1


title: "Modeling Strategies" weight: 10 summary: "Choosing the right abstraction level." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "15 mins"

Modeling Strategies

How should you represent an agent in Sruja? It depends on the scope of your diagram.

Level 1: System Context

If your AI is a product that users interact with, model it as a System.

import { * } from 'sruja.ai/stdlib'


User = person "User"
AI_Assistant = system "Support Bot"

User -> AI_Assistant "Chats with"

view index {
include *
}

Level 2: Container View

If you are designing the internals, agents are often Containers (deployable units).

import { * } from 'sruja.ai/stdlib'


AI_Assistant = system "AI Assistant" {
Router = container "Router Agent"
Search = container "Search Agent"
VectorDB = database "Memory"
}

Level 3: Component View

If you are designing a single agent's logic, the specific tools and chains are Components.

import { * } from 'sruja.ai/stdlib'


AI_Assistant = system "AI Assistant" {
SearchAgent = container "Search Agent" {
  Planner = component "ReAct Logic"
  GoogleTool = component "Search API"
  Scraper = component "Web Scraper"
}
}

Using Metadata

Use metadata to capture AI-specific details:

container GPT4Agent {
  metadata {
    model "gpt-4-turbo"
    temperature "0.7"
    max_tokens "4096"
    cost_per_1k_input "$0.01"
  }
}

Lesson 2


title: "Governance and Safety" weight: 20 summary: "Defining constraints and policies for autonomous agents." difficulty: "advanced" topic: "agentic-ai" estimatedTime: "15 mins"

Governance and Safety

Autonomous agents can be unpredictable. Architecture-as-Code allows us to define constraints to ensure safety.

Defining Requirements

Use requirement blocks to specify safety properties.

import { * } from 'sruja.ai/stdlib'


container Agent
container BankAPI

Agent -> BankAPI "Transfers funds"

requirement HumanLoop functional "Transfers > $1000 must require human approval"
requirement PII constraint "No PII should be sent to external LLM providers"

view index {
include *
}

Policy as Code

You can enforce rules about which agents can access which tools.

// Example of a prohibited relationship
// Agent -> ProductionDB "Direct Write"
// ^ This could be flagged by a linter rule

Guardrails

Model your guardrails explicitly as components that intercept messages.

container AgentSystem {
  component UserProxy "Input Guardrail"
  component LLM
  component OutputGuard "Output Validator"

  UserProxy -> LLM "Sanitized Input"
  LLM -> OutputGuard "Raw Output"
  OutputGuard -> UserProxy "Safe Response"
}

Advanced Architects


title: "Quick Start for Seasoned Software Architects" weight: 0 summary: "5-minute masterclass: Enforce architectural standards, prevent drift, and scale governance across teams."

Quick Start for Seasoned Software Architects

For senior architects who need to enforce standards across large organizations.

This 5-minute course teaches you how to use Sruja to codify architectural policies, prevent drift, and scale governance across multiple teams—without slowing down development.

Why This Course?

As organizations grow, architectural standards become critical but hard to enforce. This course shows you how to:

  • Codify policies as executable rules
  • Prevent architectural drift automatically
  • Scale governance across 100+ engineers
  • Enforce standards in CI/CD without manual reviews
  • Track compliance across services and teams

What You'll Learn

  • Policy as Code: Write architectural rules that run in CI/CD
  • Constraint Enforcement: Prevent violations before they reach production
  • Governance Patterns: Real-world patterns for large organizations
  • Compliance Automation: Track and report on architectural compliance
  • Team Scaling: How to roll out governance without friction

Who This Course Is For

  • Senior/staff architects leading multiple teams
  • Engineering managers responsible for architectural standards
  • Platform teams building developer tooling
  • Architects at companies with 50+ engineers
  • Anyone implementing architecture governance

Prerequisites

  • Experience with software architecture at scale
  • Familiarity with CI/CD pipelines
  • Basic understanding of Sruja syntax (or complete Getting Started first)

Estimated Time

5 minutes — Quick, actionable lessons you can apply immediately.

Course Structure

Module 1: Policy as Code (5 minutes)

Learn to codify architectural standards as executable policies that run in CI/CD.

You'll learn:

  • How to write constraints and conventions
  • How to enforce layer boundaries
  • How to prevent common violations
  • How to integrate with CI/CD pipelines

Learning Outcomes

By the end of this course, you'll be able to:

  • ✅ Write architectural policies as code
  • ✅ Enforce standards automatically in CI/CD
  • ✅ Prevent architectural drift before it happens
  • ✅ Scale governance across large teams
  • ✅ Track compliance across services

Real-World Application

This course uses patterns from:

  • Microservices governance at scale
  • Multi-team architecture standards
  • Compliance requirements (HIPAA, SOC 2)
  • Service boundary enforcement
  • Dependency management policies

Ready to scale your architecture governance? Let's go! 🚀

Policy As Code


title: "Module Overview: Policy as Code" weight: 0 summary: "Codify architectural standards as executable policies that prevent violations automatically."

Module Overview: Policy as Code

Turn architectural standards into executable code that runs in CI/CD.

This module teaches you how to write architectural policies as code, enforce them automatically, and scale governance across large organizations.

Learning Goals

  • Write constraints and conventions in Sruja
  • Enforce layer boundaries and service dependencies
  • Prevent architectural violations in CI/CD
  • Track compliance across services and teams

Why Policy as Code?

Traditional approach:

  • Manual code reviews
  • Architecture decision documents (ADRs) that get outdated
  • Inconsistent enforcement across teams
  • Compliance audits are manual and risky

Policy as Code approach:

  • Automated validation in CI/CD
  • Policies version-controlled with code
  • Consistent enforcement across all teams
  • Compliance reports generated automatically

What You'll Build

By the end of this module, you'll have:

  • ✅ A policy file that enforces architectural standards
  • ✅ CI/CD integration that blocks violations
  • ✅ Compliance tracking across services
  • ✅ Patterns you can apply to your organization

Estimated Time

5 minutes — Quick, focused lessons.

Prerequisites

  • Basic Sruja syntax (see Getting Started)
  • Familiarity with CI/CD (GitHub Actions, GitLab CI, etc.)
  • Understanding of architectural governance challenges

Checklist

  • Understand how to write constraints
  • Know how to enforce conventions
  • Can integrate policies into CI/CD
  • Can track compliance across services

Lesson 1


title: "Lesson 1: Writing Constraints and Conventions" weight: 1 summary: "Codify architectural rules as executable constraints that prevent violations."

Lesson 1: Writing Constraints and Conventions

The Problem: Architectural Drift

As teams grow, architectural standards drift. Services violate boundaries, dependencies become circular, and compliance requirements are missed. Manual reviews don't scale.

Example violations:

  • Frontend directly accessing database (violates layer boundaries)
  • Services in wrong layers (business logic in presentation layer)
  • Circular dependencies between services
  • Missing compliance controls (HIPAA, SOC 2)

Solution: Policy as Code

Sruja lets you codify architectural standards as constraints and conventions that are:

  • ✅ Version-controlled with your code
  • ✅ Validated automatically in CI/CD
  • ✅ Enforced consistently across teams
  • ✅ Tracked and reported on

Writing Constraints

Constraints define hard rules that must be followed. Violations block CI/CD.

import { * } from 'sruja.ai/stdlib'


// Constraint: Presentation layer cannot access datastores directly
constraint C1 {
description "Presentation layer must not access datastores"
rule "containers in layer 'presentation' must not have relations to datastores"
}

// Constraint: No circular dependencies
constraint C2 {
description "No circular dependencies between services"
rule "no cycles in service dependencies"
}

// Constraint: Compliance requirement
constraint C3 {
description "Payment services must have encryption"
rule "containers with tag 'payment' must have property 'encryption' = 'AES-256'"
}

layering {
layer Presentation "Presentation Layer" {
  description "User-facing interfaces"
}
layer Business "Business Logic Layer" {
  description "Core business logic"
}
layer Data "Data Access Layer" {
  description "Data persistence"
}
}

Shop = system "E-Commerce System" {
WebApp = container "Web Application" {
  layer Presentation
  // This would violate C1 if it accessed DB directly
}

PaymentService = container "Payment Service" {
  layer Business
  tags ["payment"]
  properties {
    encryption "AES-256"  // Required by C3
  }
}

DB = database "Database" {
  layer Data
}

// Correct: WebApp -> PaymentService -> DB (respects layers)
WebApp -> PaymentService "Processes payments"
PaymentService -> DB "Stores transactions"
}

view index {
include *
}

Writing Conventions

Conventions define best practices and naming standards. They're warnings, not blockers.

import { * } from 'sruja.ai/stdlib'


// Convention: Naming standards
convention N1 {
description "Service names should follow pattern: <domain>-<function>"
rule "container names should match pattern /^[a-z]+-[a-z]+$/"
}

// Convention: Technology standards
convention T1 {
description "API services should use REST or gRPC"
rule "containers with tag 'api' must have technology matching /REST|gRPC/"
}

Platform = system "Microservices Platform" {
container user-service "User Service" {  // ✅ Follows N1
  tags ["api"]
  technology "REST"  // ✅ Follows T1
}

authService = container "Auth Service" {  // ⚠️ Violates N1 (should be auth-service)
  tags ["api"]
  technology "GraphQL"  // ⚠️ Violates T1 (should be REST or gRPC)
}
}

view index {
include *
}

Real-World Example: Multi-Team Governance

Here's how a large organization enforces standards across teams:

import { * } from 'sruja.ai/stdlib'


// Global constraint: All services must have SLOs
constraint Global1 {
description "All production services must define SLOs"
rule "containers with tag 'production' must have slo block"
}

// Team-specific constraint: Payment team standards
constraint Payment1 {
description "Payment services must be in payment layer"
rule "containers with tag 'payment' must have layer 'payment'"
}

// Compliance constraint: HIPAA requirements
constraint Compliance1 {
description "Healthcare data must be encrypted"
rule "datastores with tag 'healthcare' must have property 'encryption' = 'AES-256'"
}

layering {
layer payment "Payment Layer"
layer healthcare "Healthcare Layer"
}

PaymentSystem = system "Payment System" {
PaymentAPI = container "Payment API" {
  layer payment
  tags ["payment", "production"]
  slo {
    availability { target "99.9%" window "30 days" }
    latency { p95 "200ms" window "7 days" }
  }
}
}

HealthcareSystem = system "Healthcare System" {
PatientDB = database "Patient Database" {
  layer healthcare
  tags ["healthcare"]
  properties {
    encryption "AES-256"
  }
}
}

view index {
include *
}

Enforcing in CI/CD

Add validation to your CI/CD pipeline:

# .github/workflows/architecture.yml
name: Architecture Validation
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Sruja
        run: curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
      - name: Validate Architecture
        run: sruja lint architecture.sruja
      - name: Check Constraints
        run: sruja validate --constraints architecture.sruja

Result: Violations block merges automatically.

Key Takeaways

  1. Constraints = Hard rules that block CI/CD
  2. Conventions = Best practices that warn
  3. Version control policies with code
  4. Automate enforcement in CI/CD
  5. Scale governance across teams

Next Steps

  • Try writing constraints for your organization
  • Integrate validation into your CI/CD pipeline
  • Track compliance across services
  • Iterate based on team feedback

You now know how to codify architectural policies. Let's enforce them automatically!

Lesson 2


title: "Lesson 2: Enforcing Policies in CI/CD" weight: 2 summary: "Integrate architectural validation into your CI/CD pipeline to prevent violations automatically."

Lesson 2: Enforcing Policies in CI/CD

The Goal: Automatic Enforcement

Policies are useless if they're not enforced. This lesson shows you how to integrate Sruja validation into CI/CD so violations are caught before they reach production.

Basic CI/CD Integration

GitHub Actions

# .github/workflows/architecture.yml
name: Architecture Validation

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install Sruja
        run: |
          curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
          echo "$HOME/.local/bin" >> $GITHUB_PATH
      
      - name: Validate Architecture
        run: |
          sruja fmt architecture.sruja
          sruja lint architecture.sruja
      
      - name: Check Constraints
        run: sruja validate --constraints architecture.sruja
      
      - name: Export Documentation
        run: sruja export markdown architecture.sruja > architecture.md

GitLab CI

# .gitlab-ci.yml
architecture-validation:
  image: alpine:latest
  before_script:
    - apk add --no-cache curl bash
    - curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
    - export PATH="$HOME/.local/bin:$PATH"
  script:
    - sruja fmt architecture.sruja
    - sruja lint architecture.sruja
    - sruja validate --constraints architecture.sruja
  only:
    - merge_requests
    - main

Advanced: Policy Violation Reporting

Generate compliance reports in CI/CD:

- name: Generate Compliance Report
  run: |
    sruja validate --constraints architecture.sruja --format json > violations.json
    sruja score architecture.sruja > score.json
  
- name: Upload Reports
  uses: actions/upload-artifact@v3
  with:
    name: architecture-reports
    path: |
      violations.json
      score.json
      architecture.md

Multi-Repository Governance

For organizations with multiple repositories, create a shared policy file:

# .github/workflows/architecture.yml
- name: Validate Against Shared Policies
  run: |
    # Fetch shared policies from central repo
    git clone https://github.com/your-org/architecture-policies.git /tmp/policies
    
    # Validate against shared constraints
    sruja validate \
      --constraints /tmp/policies/global-constraints.sruja \
      --constraints architecture.sruja

Pre-commit Hooks

Catch violations before they're committed:

#!/bin/sh
# .git/hooks/pre-commit

# Install Sruja if not available
if ! command -v sruja &> /dev/null; then
  curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
  export PATH="$HOME/.local/bin:$PATH"
fi

# Validate architecture
sruja lint architecture.sruja
if [ $? -ne 0 ]; then
  echo "❌ Architecture validation failed. Fix errors before committing."
  exit 1
fi

sruja validate --constraints architecture.sruja
if [ $? -ne 0 ]; then
  echo "❌ Constraint violations found. Fix before committing."
  exit 1
fi

echo "✅ Architecture validation passed"
exit 0

Integration with PR Reviews

Add architecture validation as a required check:

- name: Architecture Gate
  run: |
    sruja validate --constraints architecture.sruja --fail-on-violations

Result: PRs can't be merged until architecture is valid.

Monitoring Compliance

Track compliance over time:

- name: Track Compliance Metrics
  run: |
    sruja score architecture.sruja --format json > compliance-metrics.json
    
    # Send to monitoring system
    curl -X POST https://your-monitoring-system/api/metrics \
      -H "Content-Type: application/json" \
      -d @compliance-metrics.json

Key Takeaways

  1. Integrate early — Validate in CI/CD, not manually
  2. Fail fast — Block merges on violations
  3. Report compliance — Track metrics over time
  4. Share policies — Use central policy files for multi-repo orgs
  5. Pre-commit hooks — Catch issues before they're committed

Real-World Pattern

Large organization pattern:

# Central policy repository
architecture-policies/
  ├── global-constraints.sruja    # Organization-wide rules
  ├── team-payment.sruja          # Team-specific rules
  └── compliance-hipaa.sruja      # Compliance requirements

# Each service repository
service-repo/
  ├── architecture.sruja          # Service architecture
  └── .github/workflows/
      └── architecture.yml        # Validates against shared policies

Next Steps

  • Set up CI/CD validation for your architecture
  • Create shared policy files for your organization
  • Add pre-commit hooks for faster feedback
  • Track compliance metrics over time

You now know how to enforce policies automatically. Governance at scale! 🚀

Tutorials

Step-by-step guides to get things done with Sruja.

Basic

Advanced

Combine with the Beginner path or Courses for a full learning path.

CLI basics


title: "CLI Basics" weight: 10 summary: "Install, verify, and use the Sruja CLI to work with architecture models." tags: ["cli", "getting-started"]

CLI Basics

This tutorial teaches the essential Sruja CLI commands for day‑to‑day work.

Install and Verify

curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
sruja --version

If sruja is not found, ensure the install directory is in your PATH:

# If installed via install script, ensure ~/.local/bin (or script’s target) is in PATH
export PATH="$HOME/.local/bin:$PATH"

Create a Model

import { * } from 'sruja.ai/stdlib'


App = system "My App" {
Web = container "Web Server"
DB = database "Database"
}
User = person "User"

User -> App.Web "Visits"
App.Web -> App.DB "Reads/Writes"

view index {
include *
}

Lint and Compile

sruja lint example.sruja
sruja compile example.sruja

Format

sruja fmt example.sruja > example.formatted.sruja

Tree View

sruja tree --file example.sruja

Export to D2

sruja export d2 example.sruja > example.d2

DSL basics


title: "DSL Basics" weight: 20 summary: "Learn Sruja syntax: systems, containers, persons, relations, and descriptions." tags: ["dsl", "modeling"]

DSL Basics

Sruja is an architecture DSL. This tutorial introduces its core elements.

Elements

import { * } from 'sruja.ai/stdlib'


shop = system "Shop API" {
    webApp = container "Web" {
        description "Gateway layer"
    }
    catalogSvc = container "Catalog"
    mainDB = database "Database"
}

user = person "User"

user -> shop.webApp "Uses"
shop.webApp -> shop.catalogSvc "Routes"
shop.catalogSvc -> shop.mainDB "Reads/Writes"

view index {
include *
}

Descriptions and Metadata

import { * } from 'sruja.ai/stdlib'


Payments = system "Payments" {
description "Handles payments and refunds"
// metadata
metadata {
  team "FinTech"
  tier "critical"
}
}

Component‑level Modeling

import { * } from 'sruja.ai/stdlib'


App = system "App" {
Web = container "Web" {
  Cart = component "Cart"
}
}

Next Steps

Validation & linting


title: "Validation & Linting" weight: 30 summary: "Use Sruja's validator to catch errors, orphan elements, and bad references. Includes troubleshooting guide." tags: ["validation", "linting", "troubleshooting"]

Validation & Linting

Sruja ships with a validation engine that helps keep architectures healthy. This tutorial covers how to use it effectively and troubleshoot common issues.

Quick Start

# Lint a single file
sruja lint architecture.sruja

# Lint all .sruja files in a directory
sruja lint ./architectures/

# Get detailed output
sruja lint --verbose architecture.sruja

# Export validation report as JSON (for CI/CD)
sruja lint --json architecture.sruja > lint-report.json

Common Validation Checks

Sruja validates:

  1. Unique IDs: No duplicate element IDs
  2. Valid references: Relations must connect existing elements
  3. Cycle detection: Informational (cycles are valid for many patterns)
  4. Orphan detection: Elements not used by any relation
  5. Simplicity guidance: Suggests simpler syntax when appropriate
  6. Constraint violations: Policy and constraint rule violations

Real-World Example: E-Commerce Platform

Let's validate a real architecture:

import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"

ECommerce = system "E-Commerce Platform" {
    WebApp = container "Web Application" {
        technology "React"
    }
    API = container "REST API" {
        technology "Rust"
    }
    ProductDB = database "Product Database" {
        technology "PostgreSQL"
    }
    OrderDB = database "Order Database" {
        technology "PostgreSQL"
    }
}

Customer -> ECommerce.WebApp "Browses products"
ECommerce.WebApp -> ECommerce.API "Calls API"
ECommerce.API -> ECommerce.ProductDB "Reads products"
ECommerce.API -> ECommerce.OrderDB "Writes orders"

view index {
include *
}

Validation output:

✅ Valid architecture
✅ All references valid
✅ No orphan elements
ℹ️  Cycle detected: ECommerce.WebApp ↔ ECommerce.API (this is valid for request/response)

Troubleshooting Common Errors

Error 1: Invalid Reference

Error message:

❌ Invalid reference: ECommerce.API -> ECommerce.NonExistent "Calls"
   Element 'NonExistent' not found in system 'ECommerce'

Problem: You're referencing an element that doesn't exist.

Fix:

// ❌ Wrong
ECommerce.API -> ECommerce.NonExistent "Calls"

// ✅ Correct - element exists
ECommerce.API -> ECommerce.ProductDB "Reads"

Real-world scenario: You renamed a service but forgot to update all references.

Error 2: Duplicate ID

Error message:

❌ Duplicate ID: 'API' found in system 'ECommerce'
   First occurrence: line 5
   Second occurrence: line 12

Problem: Two elements have the same ID in the same scope.

Fix:

import { * } from 'sruja.ai/stdlib'


// EXPECTED_FAILURE: unexpected token
// ❌ Wrong
ECommerce = system "E-Commerce" {
API = container "REST API"
API = container "GraphQL API"  // Duplicate ID!
}

// ✅ Correct - use unique IDs
ECommerce = system "E-Commerce" {
RESTAPI = container "REST API"
GraphQLAPI = container "GraphQL API"
}

Real-world scenario: You added a new API type but used the same ID.

Error 3: Orphan Element

Warning message:

⚠️  Orphan element: ECommerce.Cache
   This element is not referenced by any relation

Problem: An element exists but nothing connects to it.

Fix options:

  1. Add a relation (if the element should be used):
// Add relation to use the cache
ECommerce.API -> ECommerce.Cache "Reads cache"
  1. Remove the element (if it's not needed):
// Remove if not part of current architecture
// datastore Cache "Cache" { ... }
  1. Document why it's isolated (if intentional):
datastore Cache "Cache" {
    description "Future: Will be used for product catalog caching"
    metadata {
        status "planned"
    }
}

Real-world scenario: You added a component for future use but haven't integrated it yet.

Error 4: Constraint Violation

Error message:

❌ Constraint violation: 'NoDirectDB' violated
   ECommerce.WebApp -> ECommerce.ProductDB "Direct database access"
   Constraint: Frontend containers cannot access databases directly

Problem: A constraint rule is being violated.

Fix:

// EXPECTED_FAILURE: Invalid reference
// ❌ Wrong - violates constraint
ECommerce.WebApp -> ECommerce.ProductDB "Direct access"

// ✅ Correct - go through API
ECommerce.WebApp -> ECommerce.API "Calls API"
ECommerce.API -> ECommerce.ProductDB "Reads products"

Real-world scenario: Enforcing architectural standards (e.g., "no direct database access from frontend").

Understanding Validation Messages

Cycles Are Valid

Sruja detects cycles but doesn't block them - cycles are valid architectural patterns:

  • Feedback loops: User ↔ System interactions
  • Event-driven: Service A ↔ Service B via events
  • Mutual dependencies: Microservices that call each other
  • Bidirectional flows: API ↔ Database (read/write)
import { * } from 'sruja.ai/stdlib'


// ✅ Valid - feedback loop
User = person "User"
Platform = system "Platform"
User -> Platform "Makes request"
Platform -> User "Sends response"

// ✅ Valid - event-driven pattern
ServiceA = system "Service A"
ServiceB = system "Service B"
ServiceA -> ServiceB "Publishes event"
ServiceB -> ServiceA "Publishes response event"

// ✅ Valid - mutual dependencies
PaymentService = system "Payment Service"
OrderService = system "Order Service"
PaymentService -> OrderService "Updates order status"
OrderService -> PaymentService "Requests payment"

view index {
include *
}

The validator will inform you about cycles but won't prevent compilation, as they're often intentional.

Simplicity Guidance

Sruja suggests simpler syntax when appropriate:

Example:

ℹ️  Simplicity suggestion: Consider using 'system' instead of nested 'container'
   Current: system App { container Web { ... } }
   Simpler: system Web { ... }

This is informational only - use the level of detail that matches your modeling goal.

CI/CD Integration

GitHub Actions Example

Add validation to your CI pipeline:

name: Validate Architecture

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Sruja
        run: |
          curl -fsSL https://raw.githubusercontent.com/sruja-ai/sruja/main/scripts/install.sh | bash
          export PATH="$HOME/go/bin:$PATH"

      - name: Lint Architecture
        run: |
          sruja lint architecture.sruja

      - name: Export Validation Report
        if: always()
        run: |
          sruja lint --json architecture.sruja > lint-report.json

      - name: Upload Report
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: lint-report
          path: lint-report.json

GitLab CI Example

validate-architecture:
  image: rust:1.70
  script:
    - cargo install sruja-cli --git https://github.com/sruja-ai/sruja
    - sruja lint architecture.sruja
  only:
    - merge_requests
    - main

Pre-commit Hook

Validate before every commit:

#!/bin/sh
# .git/hooks/pre-commit

sruja lint architecture.sruja
if [ $? -ne 0 ]; then
    echo "❌ Architecture validation failed. Fix errors before committing."
    exit 1
fi

Advanced: Custom Validation Rules

Use constraints and conventions for custom validation:

import { * } from 'sruja.ai/stdlib'


// Define constraints
constraints {
    "Frontend cannot access databases directly"
}

// Apply conventions
conventions {
    "Layered Architecture: Frontend → API → Database"
}

Platform = system "Platform" {
    Frontend = container "React App"
    API = container "REST API"
    DB = database "PostgreSQL"

    // ✅ Valid
    Frontend -> API "Calls API"
    API -> DB "Reads/Writes"

    // ❌ Will be caught by validator
    // Frontend -> DB "Direct access"  // Violates constraint
}

view index {
include *
}

Real-World Workflow

Step 1: Write Architecture

import { * } from 'sruja.ai/stdlib'


App = system "App" {
    Web = container "Web"
    DB = datastore "Database"
}

view index {
include *
}

Step 2: Validate

sruja lint architecture.sruja

Step 3: Fix Errors

Address any validation errors or warnings.

Step 4: Commit to CI/CD

Once validation passes locally, commit. CI/CD will validate again.

Step 5: Monitor in Production

Use validation in CI/CD to catch issues before they reach production.

Key Takeaways

  1. Validate early and often: Run sruja lint frequently during development
  2. Fix errors immediately: Don't accumulate validation debt
  3. Integrate with CI/CD: Catch issues before they reach production
  4. Understand cycles: They're often valid patterns, not errors
  5. Use constraints: Enforce architectural standards automatically

Exercise: Fix Validation Errors

Scenario: You have an architecture file with several validation errors.

Tasks:

  1. Run sruja lint on a file
  2. Identify all errors and warnings
  3. Fix each error
  4. Re-validate to confirm fixes

Time: 10 minutes

Further Reading

Export diagrams


title: "Export Diagrams: Mermaid & Studio" weight: 40 summary: "Export architecture to Mermaid (Markdown) or interactive Studio." tags: ["export", "diagrams", "studio", "mermaid"]

Export Diagrams

Sruja currently supports export to Mermaid (for Markdown) and interactive visualization in Studio.

Export Formats

1. Mermaid (Markdown)

Export to Mermaid code fences for use in Markdown pages:

sruja export mermaid architecture.sruja > architecture.md

The output includes ```mermaid blocks that render in most Markdown engines with Mermaid enabled.

Use cases:

  • Documentation sites using Markdown
  • Lightweight diagrams without external tooling

2. Studio (Interactive)

Open and preview diagrams interactively in Studio:

Open in Studio from the Learn examples or visit /studio/

Features:

  • Interactive preview and navigation
  • C4 model views (context, containers, components)
  • Embedded documentation and metadata

Use cases:

  • Architecture reviews
  • Presentations
  • Iterative modeling and validation

Mermaid Styling

You can customize Mermaid via frontmatter or exporter configuration. See the Mermaid exporter in crates/sruja-export/src/mermaid/ for options.

Choosing the Right Path

  • Mermaid: For Markdown-first workflows and lightweight sharing
  • Studio: For interactive exploration and richer documentation

Note: Sruja Designer provides interactive diagrams and editing capabilities.

Systems thinking


title: "Systems Thinking" weight: 25 summary: "Learn to model systems holistically: parts, boundaries, flows, feedback loops, and context." tags: ["systems", "modeling"]

Systems Thinking

Systems thinking helps you understand how components interact as part of a whole. Sruja supports five core systems thinking concepts.

1. Parts and Relationships

Systems thinking starts with understanding what the system contains (parts) and how they connect (relationships).

import { * } from 'sruja.ai/stdlib'


Customer = person "End User"

Shop = system "E-Commerce System" {
WebApp = container "Web Application" {
  technology "React"
}

API = container "API Service" {
  technology "Rust"
}

DB = database "PostgreSQL Database" {
  technology "PostgreSQL 14"
}
}

// Relationships show how parts interact
Customer -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> Shop.DB "Reads/Writes"

view index {
include *
}

Key insight: Identify the parts first, then define how they relate.

2. Boundaries

Boundaries define what's inside the system vs. what's outside (the environment).

import { * } from 'sruja.ai/stdlib'


// Inside boundary: System contains these components
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  DB = datastore "Database"
}

// Outside boundary: External entities
Customer = person "End User"
Admin = person "System Administrator"

PaymentGateway = system "Third-party Payment Service" {
metadata {
  tags ["external"]
}
}

// Relationships cross boundaries
Customer -> Shop.WebApp "Uses"
Shop.API -> PaymentGateway "Processes"

view index {
include *
}

Key insight: Use system to define internal boundaries, person and external for external boundaries.

3. Flows

Flows show how information and data move through the system. Sruja supports two flow styles:

Data Flow Diagram (DFD) Style

Use scenario for data-oriented flows:

// EXPECTED_FAILURE: Layer violation
// SKIP_ORPHAN_CHECK
import { * } from 'sruja.ai/stdlib'


Customer = person "Customer"
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  DB = datastore "Database"
}
PaymentGateway = system "PaymentGateway" {
  tags ["external"]
}

OrderProcess = scenario "Order Processing" {
Customer -> Shop.WebApp "Submits Order"
Shop.WebApp -> Shop.API "Sends Order Data"
Shop.API -> Shop.DB "Saves Order"
Shop.API -> PaymentGateway "Charges Payment"
Shop.API -> Shop.WebApp "Returns Result"
Shop.WebApp -> Customer "Shows Confirmation"
}

view index {
include *
}

User Story/Scenario Style

Use scenario for behavioral flows:

// EXPECTED_FAILURE: Layer violation
import { * } from 'sruja.ai/stdlib'


Customer = person "End User"
ECommerce = system "E-Commerce System" {
CartPage = container "Shopping Cart Page"
WebApp = container "Web Application"
API = container "API Service"
DB = database "Database"
}
PaymentGateway = system "Payment Service" {
metadata {
  tags ["external"]
}
}

Checkout = story "User Checkout Flow" {
Customer -> ECommerce.CartPage "adds items to cart"
ECommerce.CartPage -> ECommerce.WebApp "clicks checkout"
ECommerce.WebApp -> ECommerce.API "validates cart"
ECommerce.API -> ECommerce.DB "checks inventory"
ECommerce.DB -> ECommerce.API "returns stock status"
ECommerce.API -> PaymentGateway "processes payment"
PaymentGateway -> ECommerce.API "confirms payment"
ECommerce.API -> ECommerce.DB "creates order"
ECommerce.API -> ECommerce.WebApp "returns order confirmation"
ECommerce.WebApp -> Customer "displays success message"
}

view index {
include *
}

Key insight: Use flow for data flows (DFD), story/scenario for behavioral flows (BDD).

4. Feedback Loops

Feedback loops show how actions create reactions that affect future actions. Cycles are valid patterns in Sruja.

Simple Feedback Loop

// EXPECTED_FAILURE: Layer violation
person = kind "Person"
system = kind "System"
container = kind "Container"
component = kind "Component"
database = kind "Database"
queue = kind "Queue"

User = person "End User"
App = system "Application" {
WebApp = container "Web Application"
API = container "API Service"
}

// Feedback loop: User action → System response → User reaction
User -> App.WebApp "Submits Form"
App.WebApp -> App.API "Validates"
App.API -> App.WebApp "Returns Validation Result"
App.WebApp -> User "Shows Feedback"
// The feedback affects user's next action (completing the loop)

System Feedback Loop

import { * } from 'sruja.ai/stdlib'


Admin = person "Administrator"
Shop = system "Shop" {
  API = container "API"
  Inventory = datastore "Inventory"
}

// Event-driven feedback loop
Shop.API -> Shop.Inventory "Updates Stock"
Shop.Inventory -> Shop.API "Notifies Low Stock"
Shop.API -> Admin "Sends Alert"
Admin -> Shop.API "Adjusts Inventory"
// Creates feedback: API ↔ Inventory ↔ Admin

view index {
include *
}

Key insight: Cycles model natural feedback loops, event-driven patterns, and mutual dependencies. They're valid architectural patterns.

5. Context

Context defines the environment the system operates in - external dependencies, stakeholders, and surrounding systems.

import { * } from 'sruja.ai/stdlib'


// Internal system
Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  DB = database "Database"
}

// Context: Stakeholders
Customer = person "End User"
Admin = person "System Administrator"
Support = person "Customer Support"

// Context: External dependencies
PaymentGateway = system "Third-party Payment" {
metadata {
  tags ["external"]
}
}

EmailService = system "Email Notifications" {
metadata {
  tags ["external"]
}
}

AnalyticsService = system "Usage Analytics" {
metadata {
  tags ["external"]
}
}

// Context relationships
Customer -> Shop "Uses"
Admin -> Shop "Manages"
Support -> Shop "Monitors"
Shop -> PaymentGateway "Depends on"
Shop -> EmailService "Sends notifications"
Shop -> AnalyticsService "Tracks usage"

view index {
include *
}

Key insight: Context includes all external entities and dependencies that affect or are affected by your system.

Putting It All Together

Here's a complete example combining all five concepts:

// EXPECTED_FAILURE: Layer violation
person = kind "Person"
system = kind "System"
container = kind "Container"
component = kind "Component"
database = kind "Database"
queue = kind "Queue"

// 1. PARTS AND RELATIONSHIPS
Customer = person "End User"
Admin = person "System Administrator"

ECommerce = system "E-Commerce System" {
WebApp = container "Web Application" {
  technology "React"
}
API = container "API Service" {
  technology "Rust"
}
DB = database "PostgreSQL Database" {
  technology "PostgreSQL 14"
}
}

// 2. BOUNDARIES
PaymentGateway = system "Third-party Payment Service" {
metadata {
  tags ["external"]
}
}

// 3. FLOWS
OrderProcess = scenario "Order Processing" {
Customer -> ECommerce.WebApp "Submits Order"
ECommerce.WebApp -> ECommerce.API "Sends Order Data"
ECommerce.API -> ECommerce.DB "Saves Order"
ECommerce.API -> PaymentGateway "Charges Payment"
ECommerce.API -> ECommerce.WebApp "Returns Result"
ECommerce.WebApp -> Customer "Shows Confirmation"
}

// 4. FEEDBACK LOOPS
Customer -> ECommerce.WebApp "Submits Form"
ECommerce.WebApp -> ECommerce.API "Validates"
ECommerce.API -> ECommerce.WebApp "Returns Validation Result"
ECommerce.WebApp -> Customer "Shows Feedback"

ECommerce.API -> ECommerce.DB "Updates Inventory"
ECommerce.DB -> ECommerce.API "Notifies Low Stock"
ECommerce.API -> Admin "Sends Alert"
Admin -> ECommerce.API "Adjusts Inventory"

// 5. CONTEXT
Support = person "Customer Support"
EmailService = system "Email Notifications" {
metadata {
  tags ["external"]
}
}

Customer -> ECommerce "Uses"
Admin -> ECommerce "Manages"
Support -> ECommerce "Monitors"
ECommerce -> PaymentGateway "Depends on"
ECommerce -> EmailService "Sends notifications"

Why Systems Thinking Matters

  • Holistic understanding: See the whole system, not just parts
  • Natural patterns: Model real-world interactions and feedback
  • Clear boundaries: Understand what's in scope vs. context
  • Flow visualization: See how data and information move
  • Valid cycles: Feedback loops are natural, not errors

Next Steps

  • Try the complete example: examples/systems_thinking.sruja
  • Learn Deployment Modeling for infrastructure perspective

Design mode


title: "Design Mode Workflow" weight: 6 summary: "A guided, layered workflow to design architectures step‑by‑step and share focused views." tags: ["workflow", "design", "studio"]

Design Mode Workflow

Design Mode helps you build architecture assets step by step, starting with high‑level context and progressively adding detail. It also lets you focus on a specific system or container and share audience‑specific views.

Workflow Steps

Step 1: Context — define person and system

Start with the high-level context:

import { * } from 'sruja.ai/stdlib'


User = person "User"
Shop = system "Shop"

view index {
include *
}

Step 2: Containers — add container, datastore, queue to a chosen system

Add containers and datastores:

import { * } from 'sruja.ai/stdlib'


User = person "User"
App = system "App" {
WebApp = container "Web Application"
API = container "API Service"
DB = database "Database"
}

User -> App.WebApp "Uses"
App.WebApp -> App.API "Calls"
App.API -> App.DB "Reads/Writes"

view index {
include *
}

Step 3: Components — add component inside a chosen container

Drill down into components:

import { * } from 'sruja.ai/stdlib'


App = system "App" {
WebApp = container "Web Application" {
  UI = component "User Interface"
}
API = container "API Service" {
  Auth = component "Auth Service"
}
}

// Component‑level interaction
App.WebApp.UI -> App.API.Auth "Calls"

view index {
include *
}

Step 4: Stitch — add relations and optional scenarios; share focused views

Add relations and scenarios to complete the model.

Layers and Focus

  • Levels: L1 Context, L2 Containers, L3 Components, All
  • Focus:
    • L2 focus by systemId
    • L3 focus by systemId.containerId

When focused, non‑relevant nodes/edges are dimmed so you can work deeper without distractions.

Viewer opens focused views via URL params:

  • ?level=1 → Context
  • ?level=2&focus=Shop → Containers of system Shop
  • ?level=3&focus=Shop.API → Components in container API of system Shop
  • DSL payload is passed with #code=<lz-base64> or ?code=<urlencoded>.

Studio Experience

  • Diagram‑first: Studio opens with the diagram; a Design Mode overlay guides steps
  • Contextual palette: add containers at L2 (focused system), components at L3 (focused container)
  • Autosave on close: resume drafts; share per‑layer links from the toolbar

Viewer Experience

  • Use level buttons and focus to tailor the view
  • Dimming clarifies what's relevant at each depth
  • Share via copied URL (includes level, focus, and DSL)

See Also

Demo script


title: "Demo Script" weight: 5 summary: "10‑minute walkthrough: model, validate, and export." tags: ["demo", "getting-started", "walkthrough"]

Demo Script: Quick 10-Minute Walkthrough

This tutorial provides a quick 10-minute walkthrough to demonstrate Sruja's core capabilities: modeling, validation, and export.

1) Model (2 minutes)

Create a simple e-commerce architecture:

import { * } from 'sruja.ai/stdlib'


User = person "User"
Shop = system "Shop" {
  WebApp = container "Web App"
  API = container "API"
  DB = datastore "Database"
}

User -> Shop.WebApp "Uses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> Shop.DB "Reads/Writes"

view index {
include *
}

2) Validate (2 minutes)

Format and validate your model:

sruja fmt architecture.sruja
sruja lint architecture.sruja

3) Add Targets (3 minutes)

Add SLOs and scaling configuration:

import { * } from 'sruja.ai/stdlib'


Shop = system "Shop" {
API = container "API" {
  scale {
    metric "req/s"
    min 200
    max 2000
  }

  slo {
    availability {
      target "99.9%"
      window "30 days"
    }
    latency {
      p95 "200ms"
      window "7 days"
    }
    errorRate {
      target "< 0.1%"
      window "30 days"
    }
  }
}
}

view index {
include *
}

4) Export (3 minutes)

Export to various formats:

sruja export markdown architecture.sruja
sruja export mermaid architecture.sruja
sruja export svg architecture.sruja

Outcome: Living docs and diagrams generated from the model.


Note: Sruja is free and open source (Apache 2.0 licensed). Need help with adoption? Professional consulting services are available. Contact the team through GitHub Discussions to learn more.

Deployment modeling


title: "Deployment Modeling" weight: 70 summary: "Map logical elements to deployment nodes for environment diagrams." tags: ["deployment", "infrastructure"]

Deployment Modeling

Model production environments and map containers onto infrastructure nodes.

import { * } from 'sruja.ai/stdlib'


WebServer = container "Nginx"
AppServer = container "Python App"
Database = database "Postgres"


deployment Production "Production" {
  node AWS "AWS" {
    node USEast1 "US-East-1" {
      node EC2 "EC2 Instance" {
        containerInstance WebServer
        containerInstance AppServer
      }
      node RDS "RDS" {
        containerInstance Database
      }
    }
  }
}

view index {
include *
}

CI/CD integration


title: "CI/CD Integration" weight: 50 summary: "Integrate Sruja validation and documentation into your CI/CD pipelines for automated architecture governance." tags: ["devops", "cicd", "automation", "governance"] difficulty: "intermediate" estimatedTime: "20 min"

CI/CD Integration

Integrate Sruja into your CI/CD pipeline to automatically validate architecture, enforce standards, and generate documentation on every commit.

Why CI/CD Integration?

For DevOps teams:

  • Catch architecture violations before they reach production
  • Automate documentation generation
  • Enforce architectural standards across teams
  • Reduce manual review overhead

For software architects:

  • Ensure architectural decisions are documented
  • Prevent architectural drift
  • Scale governance across multiple teams

For product teams:

  • Keep architecture docs up-to-date automatically
  • Track architecture changes over time
  • Ensure compliance with requirements

Real-World Scenario

Challenge: A team of 50 engineers across 10 microservices. Architecture documentation is outdated, and violations happen frequently.

Solution: Integrate Sruja validation into CI/CD to:

  • Validate architecture on every PR
  • Generate updated documentation automatically
  • Block merges if constraints are violated
  • Track architecture changes over time

GitHub Actions Integration

Sruja’s CLI is written in Rust. In CI you can either build from source in this repo or install from the Git repo with cargo install. A reusable composite action is available in the Sruja repo for building and validating.

Using the Sruja repo reusable action (this repository)

If your workflow runs inside the sruja repo, use the composite action so the CLI is built once and lint/export run on your files:

- uses: actions/checkout@v4
- uses: ./.github/actions/sruja-validate
  with:
    working-directory: .
    files: "examples/**/*.sruja" # or '**/*.sruja'
    run-export: "false"

Basic setup (any repository)

Install the CLI from the Sruja Git repo with Cargo, then run sruja lint and sruja export:

name: Architecture Validation

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  validate-architecture:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Validate Architecture
        run: sruja lint architecture.sruja

      - name: Export Documentation
        run: |
          sruja export markdown architecture.sruja > architecture.md
          sruja export json architecture.sruja > architecture.json

      - name: Upload Artifacts
        uses: actions/upload-artifact@v3
        with:
          name: architecture-docs
          path: |
            architecture.md
            architecture.json

Advanced: Enforce Constraints

name: Architecture Governance

on: [pull_request]

jobs:
  enforce-architecture:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0 # Full history for diff

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Validate Architecture
        id: validate
        run: |
          sruja lint architecture.sruja > lint-output.txt 2>&1
          exit_code=$?
          echo "exit_code=$exit_code" >> $GITHUB_OUTPUT
          cat lint-output.txt

      - name: Check for Constraint Violations
        if: steps.validate.outputs.exit_code != 0
        run: |
          echo "❌ Architecture validation failed!"
          echo "Please fix the errors before merging."
          exit 1

      - name: Comment PR with Validation Results
        if: always()
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const lintOutput = fs.readFileSync('lint-output.txt', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Architecture Validation Results\n\n\`\`\`\n${lintOutput}\n\`\`\``
            });

Multi-Architecture Validation

For monorepos with multiple architecture files:

name: Validate All Architectures

on: [push, pull_request]

jobs:
  validate-all:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        architecture:
          - architecture.sruja
          - services/payment-service.sruja
          - services/order-service.sruja
          - services/user-service.sruja
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Validate ${{ matrix.architecture }}
        run: sruja lint ${{ matrix.architecture }}

GitLab CI Integration

stages:
  - validate

validate-architecture:
  stage: validate
  image: rust:1.70
  before_script:
    - cargo install sruja-cli --git https://github.com/sruja-ai/sruja
  script:
    - sruja lint architecture.sruja
    - sruja export markdown architecture.sruja > architecture.md
    - sruja export json architecture.sruja > architecture.json
  artifacts:
    paths:
      - architecture.md
      - architecture.json
    expire_in: 30 days
  only:
    - merge_requests
    - main

Jenkins Integration

pipeline {
    agent any

    stages {
        stage('Install Sruja CLI') {
            steps {
                sh 'cargo install sruja-cli --git https://github.com/sruja-ai/sruja'
            }
        }
        stage('Validate Architecture') {
            steps {
                sh 'sruja lint architecture.sruja'
            }
        }

        stage('Generate Documentation') {
            steps {
                sh '''
                    sruja export markdown architecture.sruja > architecture.md
                    sruja export json architecture.sruja > architecture.json
                '''
            }
        }

        stage('Archive Documentation') {
            steps {
                archiveArtifacts artifacts: 'architecture.*', fingerprint: true
            }
        }
    }

    post {
        failure {
            emailext (
                subject: "Architecture Validation Failed: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
                body: "Architecture validation failed. Please check the build logs.",
                to: "${env.CHANGE_AUTHOR_EMAIL}"
            )
        }
    }
}

CircleCI Integration

version: 2.1

jobs:
  validate-architecture:
    docker:
      - image: rust:1.70
    steps:
      - checkout
      - run:
          name: Install Sruja CLI
          command: cargo install sruja-cli --git https://github.com/sruja-ai/sruja
      - run:
          name: Validate
          command: sruja lint architecture.sruja
      - run:
          name: Generate Docs
          command: sruja export markdown architecture.sruja > architecture.md
      - store_artifacts:
          path: architecture.md

workflows:
  version: 2
  validate:
    jobs:
      - validate-architecture

Pre-commit Hooks

Validate before every commit locally. Ensure the Sruja CLI is on your PATH (e.g. cargo install sruja-cli --git https://github.com/sruja-ai/sruja or build from the Sruja repo):

#!/bin/sh
# .git/hooks/pre-commit

if ! command -v sruja &> /dev/null; then
    echo "Sruja CLI not found. Install with: cargo install sruja-cli --git https://github.com/sruja-ai/sruja"
    exit 1
fi

sruja lint architecture.sruja
if [ $? -ne 0 ]; then
    echo "❌ Architecture validation failed. Fix errors before committing."
    exit 1
fi

sruja fmt architecture.sruja > architecture.formatted.sruja
mv architecture.formatted.sruja architecture.sruja
git add architecture.sruja

exit 0

Or use the pre-commit framework (requires Sruja on PATH):

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: sruja-lint
        name: Sruja Lint
        entry: sruja lint
        language: system
        files: \.sruja$
        pass_filenames: true

Automated Documentation Updates

Generate and commit documentation automatically:

name: Update Architecture Docs

on:
  push:
    branches: [main]
    paths:
      - "architecture.sruja"

jobs:
  update-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Generate Documentation
        run: |
          sruja export markdown architecture.sruja > docs/architecture.md
          sruja export json architecture.sruja > docs/architecture.json

      - name: Commit Changes
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add docs/architecture.*
          git diff --staged --quiet || git commit -m "docs: update architecture documentation"
          git push

Architecture Change Tracking

Track architecture changes over time:

name: Track Architecture Changes

on:
  pull_request:
    paths:
      - "architecture.sruja"

jobs:
  track-changes:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Compare Architectures
        run: |
          git show origin/${{ github.base_ref }}:architecture.sruja > base.sruja
          sruja export json base.sruja > base.json
          sruja export json architecture.sruja > current.json
          echo "## Architecture Changes" >> $GITHUB_STEP_SUMMARY
          echo "Comparing base and current architecture..." >> $GITHUB_STEP_SUMMARY

      - name: Comment Changes
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '## Architecture Changes Detected\n\nReview the architecture changes in this PR.'
            });

Real-World Example: Microservices Platform

Complete CI/CD setup for a microservices platform:

name: Architecture Governance

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  validate-architecture:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        service:
          - payment-service
          - order-service
          - user-service
          - inventory-service
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Validate ${{ matrix.service }}
        run: sruja lint services/${{ matrix.service }}/architecture.sruja

      - name: Generate Service Docs
        run: sruja export markdown services/${{ matrix.service }}/architecture.sruja > docs/services/${{ matrix.service }}.md

  validate-platform:
    runs-on: ubuntu-latest
    needs: validate-architecture
    steps:
      - uses: actions/checkout@v4

      - name: Install Rust
        uses: dtolnay/rust-toolchain@stable
      - name: Install Sruja CLI
        run: cargo install sruja-cli --git https://github.com/sruja-ai/sruja

      - name: Validate Platform Architecture
        run: sruja lint platform-architecture.sruja

      - name: Generate Platform Docs
        run: |
          sruja export markdown platform-architecture.sruja > docs/platform.md
          sruja export json platform-architecture.sruja > docs/platform.json

      - name: Upload Documentation
        uses: actions/upload-artifact@v3
        with:
          name: architecture-docs
          path: docs/

Key Takeaways

  1. Automate everything: Don't rely on manual validation
  2. Fail fast: Block merges if constraints are violated
  3. Generate docs automatically: Keep documentation up-to-date
  4. Track changes: Monitor architecture evolution over time
  5. Scale governance: Use CI/CD to enforce standards across teams

Exercise: Set Up CI/CD Integration

Tasks:

  1. Choose a CI/CD platform (GitHub Actions, GitLab CI, etc.)
  2. Create a workflow that validates architecture on every PR
  3. Add documentation generation
  4. Test the workflow with a sample architecture file

Time: 20 minutes

Further Reading

Agentic AI modeling


title: "Agentic AI Modeling" weight: 40 summary: "Model agent orchestration, tools, and memory using Sruja DSL." tags: ["ai", "agents", "rag", "modeling"] difficulty: "advanced" estimatedTime: "30 min"

Agentic AI Modeling

This tutorial shows how to model agent-based systems with orchestrators, planners, executors, tools, and memory.

Core Structure

import { * } from 'sruja.ai/stdlib'


AgentSystem = system "Agentic System" {
Orchestrator = container "Agent Orchestrator"
Planner = container "Planner"
Executor = container "Executor"
Tools = container "Tooling API"
Memory = database "Long-Term Memory"
}

User = person "User"

User -> AgentSystem.Orchestrator "Requests task"
AgentSystem.Orchestrator -> AgentSystem.Planner "Plans steps"
AgentSystem.Orchestrator -> AgentSystem.Executor "Delegates execution"
AgentSystem.Executor -> AgentSystem.Tools "Calls tools"
AgentSystem.Executor -> AgentSystem.Memory "Updates state"

view index {
include *
}

Add Governance

Guardrails = policy "Safety Policies" {
  description "Limit tool calls, enforce approvals, track risky operations"
}

R1 = requirement functional "Explain actions"
R2 = requirement constraint "No PII exfiltration"

Integrate RAG

import { * } from 'sruja.ai/stdlib'


AgentSystem = system "Agent System" {
  Executor = container "Executor"
}

RAG = system "Retrieval-Augmented Generation" {
  Retriever = container "Retriever"
  Generator = container "Generator"
  VectorDB = database "VectorDB"
}

AgentSystem.Executor -> RAG.Retriever "Fetch contexts"
RAG.Retriever -> RAG.VectorDB "Search"
RAG.Generator -> AgentSystem.Executor "Produce answer"

Next Steps

  • Explore examples/pattern_agentic_ai.sruja and examples/pattern_rag_pipeline.sruja
  • Add scenarios to capture common workflows
  • Use views to present developer vs. executive perspectives

Extending the CLI (Rust)


title: "Extending the CLI (Rust)" weight: 80 summary: "The CLI is implemented in Rust with clap. Add or change subcommands in crates/sruja-cli." tags: ["cli", "rust", "advanced"]

Extending the CLI (Rust)

Sruja's CLI lives in crates/sruja-cli and uses clap for argument parsing. To add or change subcommands:

  1. Open crates/sruja-cli/src/main.rs (or the relevant module) and see how existing commands (e.g. lint, export) are defined.
  2. Add a subcommand using clap's Subcommand enum and match on it in the main entrypoint; run your logic and return Result with ? for errors.
  3. Run and test with cargo run -p sruja-cli -- <subcommand> ....

Shell completions are available:

sruja completion bash
sruja completion zsh
sruja completion fish

For patterns and conventions, see the repo's AGENTS.md (Rust skills) and docs/CODING_GUIDELINES.md.

Challenges

Hands-on exercises to practice Sruja. Each challenge has a goal and optional hints.

ChallengeFocus
Add ComponentAdd a component to a system
Deployment ArchitectureModel deployment
External ServiceIntegrate an external system
Fix RelationsCorrect relation definitions
Missing RelationsFind and add missing relations
Queue WorkerModel a queue-based flow
Syntax ErrorFix a syntax error

See the Beginner path for a suggested order with tutorials and courses.

Add Component


title: "Social Feed: Add Recommendation Engine" summary: "Your social media platform needs personalized content! Add a Recommendation component to the FeedService container that suggests posts based on user interests." difficulty: beginner topic: components estimatedTime: "5-10 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" component = kind "Component" database = kind "Database"

User = person "Social Media User"

SocialApp = system "Social App" { FeedService = container "Feed Service" { technology "Python, FastAPI" // TODO: Add component Recommendation "Recommendation Engine" here } UserDB = database "MongoDB" }

User -> SocialApp.FeedService "Views feed" SocialApp.FeedService -> SocialApp.UserDB "Queries user data" checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: elementExists name: Recommendation message: "Create Recommendation component in FeedService" hints:
  • "Components are defined inside containers using curly braces"
  • 'Add component Recommendation "Recommendation Engine" inside the FeedService block'
  • "Make sure to open the FeedService container block with { before adding the component" solution: | person = kind "Person" system = kind "System" container = kind "Container" component = kind "Component" database = kind "Database"

User = person "Social Media User"

SocialApp = system "Social App" { FeedService = container "Feed Service" { technology "Python, FastAPI" Recommendation = component "Recommendation Engine" } UserDB = database "MongoDB" }

User -> SocialApp.FeedService "Views feed" SocialApp.FeedService -> SocialApp.UserDB "Queries user data"

Deployment Architecture


title: "CDN Architecture: Add Cache Layer" summary: "Your content delivery network needs a caching layer! Add a Cache datastore and connect it to both the API and CDN edge servers for faster content delivery." difficulty: intermediate topic: deployment estimatedTime: "15-20 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Viewer = person "Content Viewer"

CDN = system "CDN" { EdgeServer = container "Edge Server" API = container "Origin API" OriginDB = datastore "Origin Database" }

Viewer -> CDN.EdgeServer "Requests content" CDN.EdgeServer -> CDN.API "Fetches from origin" CDN.API -> CDN.OriginDB "Reads content"

// TODO: Add Cache datastore and connect EdgeServer -> Cache and API -> Cache checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: elementExists name: Cache message: "Add Cache datastore"
  • type: relationExists source: EdgeServer target: Cache message: "Add relation EdgeServer -> Cache"
  • type: relationExists source: API target: Cache message: "Add relation API -> Cache" hints:
  • 'Add datastore Cache "Redis Cache" inside the CDN system'
  • 'EdgeServer reads from cache: EdgeServer -> Cache "Reads cached content"'
  • 'API writes to cache: API -> Cache "Writes content"'
  • "Cache sits between EdgeServer and API to reduce origin load" solution: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Viewer = person "Content Viewer"

CDN = system "CDN" { EdgeServer = container "Edge Server" API = container "Origin API" Cache = datastore "Redis Cache" OriginDB = datastore "Origin Database" }

Viewer -> CDN.EdgeServer "Requests content" CDN.EdgeServer -> CDN.API "Fetches from origin" CDN.EdgeServer -> CDN.Cache "Reads cached content" CDN.API -> CDN.OriginDB "Reads content" CDN.API -> CDN.Cache "Writes content"

External Service


title: "Weather App: Integrate Weather API" summary: "Your weather app needs real-time weather data! Integrate an external weather service (like OpenWeatherMap) that your API can query for current conditions." difficulty: intermediate topic: integration estimatedTime: "10-15 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" database = kind "Database"

User = person "App User"

WeatherApp = system "Weather App" { MobileApp = container "Mobile App" WeatherAPI = container "Weather Service API" UserPrefs = database "User Preferences DB" }

User -> WeatherApp.MobileApp "Checks weather" WeatherApp.MobileApp -> WeatherApp.WeatherAPI "Requests forecast" WeatherApp.WeatherAPI -> WeatherApp.UserPrefs "Loads preferences"

// TODO: Add external WeatherService system with tags and connect WeatherAPI -> WeatherService checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: elementExists name: WeatherService message: "Add external WeatherService"
  • type: relationExists source: WeatherAPI target: WeatherService message: "Connect WeatherAPI -> WeatherService" hints:
  • 'External services are represented as systems with tags ["external"]'
  • 'Use: WeatherService = system "OpenWeatherMap" { tags ["external"] }'
  • "External services represent third-party APIs you don't control"
  • 'Add relation: WeatherApp.WeatherAPI -> WeatherService "Fetches weather data"' solution: | person = kind "Person" system = kind "System" container = kind "Container" database = kind "Database"

User = person "App User"

WeatherApp = system "Weather App" { MobileApp = container "Mobile App" WeatherAPI = container "Weather Service API" UserPrefs = database "User Preferences DB" }

WeatherService = system "OpenWeatherMap" { tags ["external"] }

User -> WeatherApp.MobileApp "Checks weather" WeatherApp.MobileApp -> WeatherApp.WeatherAPI "Requests forecast" WeatherApp.WeatherAPI -> WeatherApp.UserPrefs "Loads preferences" WeatherApp.WeatherAPI -> WeatherService "Fetches weather data"

Fix Relations


title: "Microservices: Connect Service Mesh" summary: "You're building a microservices architecture! Connect the UserService, OrderService, and PaymentService so they can communicate. Each service has its own database." difficulty: beginner topic: relations estimatedTime: "5-10 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Customer = person "Online Customer"

ECommerce = system "E-Commerce" { UserService = container "User Management" OrderService = container "Order Processing" PaymentService = container "Payment Processing" UserDB = datastore "User Database" OrderDB = datastore "Order Database" PaymentDB = datastore "Payment Database" }

Customer -> ECommerce.UserService "Logs in"

// TODO: Connect services in order flow: UserService -> OrderService -> PaymentService // TODO: Connect each service to its database checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: relationExists source: UserService target: OrderService message: "Add relation UserService -> OrderService"
  • type: relationExists source: OrderService target: PaymentService message: "Add relation OrderService -> PaymentService"
  • type: relationExists source: UserService target: UserDB message: "Add relation UserService -> UserDB"
  • type: relationExists source: OrderService target: OrderDB message: "Add relation OrderService -> OrderDB"
  • type: relationExists source: PaymentService target: PaymentDB message: "Add relation PaymentService -> PaymentDB" hints:
  • 'Order flow: UserService -> OrderService "Creates order"'
  • 'Then: OrderService -> PaymentService "Processes payment"'
  • 'Each service connects to its own DB: Service -> DB "Reads/Writes"'
  • "Use descriptive labels for each relation" solution: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Customer = person "Online Customer"

ECommerce = system "E-Commerce" { UserService = container "User Management" OrderService = container "Order Processing" PaymentService = container "Payment Processing" UserDB = datastore "User Database" OrderDB = datastore "Order Database" PaymentDB = datastore "Payment Database" }

Customer -> ECommerce.UserService "Logs in" ECommerce.UserService -> ECommerce.OrderService "Creates order" ECommerce.OrderService -> ECommerce.PaymentService "Processes payment" ECommerce.UserService -> ECommerce.UserDB "Reads/Writes" ECommerce.OrderService -> ECommerce.OrderDB "Reads/Writes" ECommerce.PaymentService -> ECommerce.PaymentDB "Reads/Writes"

Missing Relations


title: "Healthcare Portal: Connect Patient to System" summary: "A patient needs to book an appointment! Model the complete flow: Patient uses the Portal, Portal calls the Appointment API, and API stores data in the database." difficulty: beginner topic: relations estimatedTime: "5-10 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Patient = person "Healthcare Patient"

HealthPortal = system "Healthcare Portal" { Portal = container "Patient Portal" AppointmentAPI = container "Appointment Service" RecordsDB = datastore "Patient Records Database" }

// TODO: Connect Patient -> Portal -> AppointmentAPI -> RecordsDB // Think about what each interaction represents checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: relationExists source: Patient target: Portal message: "Add relation Patient -> Portal"
  • type: relationExists source: Portal target: AppointmentAPI message: "Add relation Portal -> AppointmentAPI"
  • type: relationExists source: AppointmentAPI target: RecordsDB message: "Add relation AppointmentAPI -> RecordsDB" hints:
  • 'Start with Patient -> Portal "Books appointment"'
  • 'Then Portal -> AppointmentAPI "Requests appointment"'
  • 'Finally AppointmentAPI -> RecordsDB "Stores appointment"'
  • "Remember: all relation labels must be in quotes" solution: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Patient = person "Healthcare Patient"

HealthPortal = system "Healthcare Portal" { Portal = container "Patient Portal" AppointmentAPI = container "Appointment Service" RecordsDB = datastore "Patient Records Database" }

Patient -> HealthPortal.Portal "Books appointment" HealthPortal.Portal -> HealthPortal.AppointmentAPI "Requests appointment" HealthPortal.AppointmentAPI -> HealthPortal.RecordsDB "Stores appointment"

Queue Worker


title: "Email Notification System: Build Async Processor" summary: "Your app sends too many emails synchronously, causing slow responses! Create an async email processing system with a queue and worker to handle notifications in the background." difficulty: intermediate topic: async estimatedTime: "10-15 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" component = kind "Component" queue = kind "Queue"

App = system "App" { API = container "Main API" { // TODO: Add component EmailWorker "Email Processor" here } EmailQueue = queue "RabbitMQ" }

// TODO: Connect App.EmailQueue -> App.API.EmailWorker for async processing // TODO: Add external EmailService system with tags and connect App.API.EmailWorker -> EmailService checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: elementExists name: EmailWorker message: "Create EmailWorker component"
  • type: relationExists source: EmailQueue target: EmailWorker message: "Connect EmailQueue -> EmailWorker"
  • type: elementExists name: EmailService message: "Add external EmailService"
  • type: relationExists source: EmailWorker target: EmailService message: "Connect EmailWorker -> EmailService" hints:
  • 'Add component EmailWorker "Email Processor" inside the API container'
  • 'Queues deliver messages to workers: App.EmailQueue -> App.API.EmailWorker "Delivers email job"'
  • 'Add external EmailService as a system with tags ["external"] outside the App system block'
  • 'Worker sends emails: App.API.EmailWorker -> EmailService "Sends email"' solution: | person = kind "Person" system = kind "System" container = kind "Container" component = kind "Component" queue = kind "Queue"

App = system "App" { API = container "Main API" { EmailWorker = component "Email Processor" } EmailQueue = queue "RabbitMQ" }

EmailService = system "SendGrid" { tags ["external"] }

App.EmailQueue -> App.API.EmailWorker "Delivers email job" App.API.EmailWorker -> EmailService "Sends email"

Syntax Error


title: "Code Review: Fix the Ride-Sharing App" summary: "A junior developer wrote this code for a ride-sharing app, but it has syntax errors. Find and fix all the issues to get it compiling!" difficulty: beginner topic: validation estimatedTime: "5-8 min" initialDsl: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Rider = person "App User"

RideApp = system "RideApp" { MobileApp = container "Mobile Application" MatchingService = container "Driver Matching" LocationDB = datastore "Location Database" }

Rider -> RideApp.MobileApp "Requests ride" RideApp.MobileApp -> RideApp.MatchingService Finds driver // Missing quotes RideApp.MatchingService -> RideApp.LocationDB "Queries locations" checks:

  • type: noErrors message: "DSL parsed successfully"
  • type: relationExists source: MobileApp target: MatchingService message: "Fix relation MobileApp -> MatchingService"
  • type: relationExists source: MatchingService target: LocationDB message: "Ensure relation MatchingService -> LocationDB exists" hints:
  • "Check the MatchingService container definition - is it properly closed?"
  • "Look at the relation 'MobileApp -> MatchingService Finds driver' - what's missing?"
  • "All relation labels must be wrapped in double quotes"
  • "Container definitions need proper closing braces" solution: | person = kind "Person" system = kind "System" container = kind "Container" datastore = kind "Datastore"

Rider = person "App User"

RideApp = system "RideApp" { MobileApp = container "Mobile Application" MatchingService = container "Driver Matching" LocationDB = datastore "Location Database" }

Rider -> RideApp.MobileApp "Requests ride" RideApp.MobileApp -> RideApp.MatchingService "Finds driver" RideApp.MatchingService -> RideApp.LocationDB "Queries locations"

CLI reference

Core commands:

CommandDescription
sruja lint <file>Validate .sruja file
sruja fmt <file>Format DSL
sruja tree <file>Print element tree
sruja export json <file>Export to JSON
sruja export markdown <file>Export to Markdown
sruja export mermaid <file>Export to Mermaid

Run sruja --help for full options.

Language reference

The Sruja DSL defines architecture using kinds (e.g. person, system, container) and relationships (e.g. ->).

For the full specification, see the Language specification in this book.

Sruja Language Specification

This document provides a complete specification of the Sruja architecture-as-code language for AI code assistants and developers.

Overview

Sruja is a domain-specific language (DSL) for defining software architecture models. It supports C4 model concepts (systems, containers, components), requirements, ADRs, scenarios, flows, policies, SLOs, and more.

Language Grammar

File Structure

Sruja uses a flat syntax — all declarations are top-level, no wrapper blocks required.

// Elements
User = person "User"
Shop = system "E-commerce Shop"

// Relationships
User -> Shop "uses"

// Governance
R1 = requirement functional "Must handle 10k users"
SecurityPolicy = policy "Encrypt all data" category "security"

Element Kinds

Before using elements like person, system, container, etc., you must declare them as kinds. This establishes the vocabulary of element types available in your architecture.

// Standard C4 kinds (required at top of file)
person = kind "Person"
system = kind "System"
container = kind "Container"
component = kind "Component"
database = kind "Database"
datastore = kind "Datastore"  // Alias for 'database', but 'database' is the preferred standard
queue = kind "Queue"

Why kinds? This allows Sruja to:

  • Validate that you're using recognized element types
  • Enable custom element types for domain-specific modeling
  • Provide LSP autocompletion for your declared kinds

Custom Kinds

You can define custom element types for your domain:

// Custom kinds for microservices
microservice = kind "Microservice"
eventBus = kind "Event Bus"
gateway = kind "API Gateway"

// Now use them
Catalog = microservice "Catalog Service"
Kafka = eventBus "Kafka Cluster"

Imports

Import kinds and tags from the standard library or other Sruja files.

Standard Library Import

// Import all from stdlib
import { * } from 'sruja.ai/stdlib'

// Now you can use person, system, container, etc. without defining them
User = person "User"
Shop = system "Shop"

Named Imports

// Import specific kinds only
import { person, system, container } from 'sruja.ai/stdlib'

User = person "User"
Shop = system "Shop"

Relative Imports

// Import from a local file
import { * } from './shared-kinds.sruja'

Note: When using imports, you don't need to redeclare the imported kinds.

Elements

Persons

User = person "User" {
    description "End user of the system"
}

Systems

MySystem = system "My System" {
    description "Optional description"
    metadata {
        key "value"
        tags ["tag1", "tag2"]
    }
    slo {
        availability {
            target "99.9%"
            window "30d"
            current "99.95%"
        }
    }
}

Containers

MyContainer = container "My Container" {
    technology "Technology stack"
    description "Optional description"
    version "1.0.0"
    tags ["api", "backend"]
    scale {
        min 3
        max 10
        metric "cpu > 80%"
    }
    slo {
        latency {
            p95 "200ms"
            p99 "500ms"
        }
    }
}

Components

MyComponent = component "My Component" {
    technology "Technology"
    description "Optional description"
    scale {
        min 1
        max 5
    }
}

Data Stores

MyDB = database "My Database" {
    technology "PostgreSQL"
    description "Optional description"
}

Queues

MyQueue = queue "My Queue" {
    technology "RabbitMQ"
    description "Optional description"
}

Relationships

// Basic relationship
From -> To "Label"

// Nested element references use dot notation
System.Container -> System.Container.Component "calls"

// With tags
From -> To "Label" [tag1, tag2]

Requirements

R1 = requirement functional "Description"
R2 = requirement nonfunctional "Description"
R3 = requirement constraint "Description"
R4 = requirement performance "Description"
R5 = requirement security "Description"

// With body block
R6 = requirement functional "Description" {
    description "Detailed description"
    metadata {
        priority "high"
    }
}

ADRs (Architectural Decision Records)

ADR001 = adr "Title" {
    status "accepted"
    context "What situation led to this decision"
    decision "What was decided"
    consequences "Trade-offs, gains, and losses"
}

Scenarios and Flows

Scenarios

MyScenario = scenario "Scenario Title" {
    step User -> System.WebApp "Credentials"
    step System.WebApp -> System.DB "Verify"
}

// 'story' is an alias for 'scenario'
CheckoutStory = story "User Checkout Flow" {
    step User -> ECommerce.CartPage "adds item to cart"
}

Note: The step keyword is recommended for clarity, but optional. Both syntaxes work:

  • With step: step User -> System.WebApp "action"
  • Without step: User -> System.WebApp "action" (inside scenario block)

Flows (DFD-style data flows)

OrderProcess = flow "Order Processing" {
    step Customer -> Shop.WebApp "Order Details"
    step Shop.WebApp -> Shop.Database "Save Order"
    step Shop.Database -> Shop.WebApp "Confirmation"
}

Note: Flows use the same syntax as scenarios. The step keyword is recommended for clarity.

Metadata

metadata {
    key "value"
    anotherKey "another value"
    tags ["tag1", "tag2"]
}

Overview Block

overview {
    summary "High-level summary of the architecture"
    audience "Target audience for this architecture"
    scope "What is covered in this architecture"
    goals ["Goal 1", "Goal 2"]
    nonGoals ["What is explicitly out of scope"]
    risks ["Risk 1", "Risk 2"]
}

SLO (Service Level Objectives)

slo {
    availability {
        target "99.9%"
        window "30 days"
        current "99.95%"
    }
    latency {
        p95 "200ms"
        p99 "500ms"
        window "7 days"
        current {
            p95 "180ms"
            p99 "420ms"
        }
    }
    errorRate {
        target "0.1%"
        window "7 days"
        current "0.08%"
    }
    throughput {
        target "10000 req/s"
        window "peak hour"
        current "8500 req/s"
    }
}

SLO blocks can be defined at:

  • Architecture level (top-level)
  • System level
  • Container level

Scale Block

scale {
    min 3
    max 10
    metric "cpu > 80%"
}

Scale blocks can be defined at:

  • Container level
  • Component level

Deployment

deployment Prod "Production" {
    node AWS "AWS" {
        node USEast1 "US-East-1" {
            infrastructure LB "Load Balancer"
            containerInstance Shop.API
        }
    }
}

Governance

Policies

policy SecurityPolicy "Enforce TLS 1.3" category "security" enforcement "required"

// Or with body block
policy DataRetentionPolicy "Retain data for 7 years" {
    category "compliance"
    enforcement "required"
    description "Detailed policy description"
}

Constraints

constraints {
    "Constraint description"
    "Another constraint"
}

Conventions

conventions {
    "Convention description"
    "Another convention"
}

Views (Optional)

Views are optional — if not specified, standard C4 views are automatically generated.

view index {
    title "System Context"
    include *
}

view container_view of Shop {
    title "Shop Containers"
    include Shop.*
    exclude Shop.WebApp
    autolayout lr
}

styles {
    element "Database" {
        shape "cylinder"
        color "#ff0000"
    }
}

View Types

  • index - System context view (C4 L1)
  • container - Container view (C4 L2)
  • component - Component view (C4 L3)
  • deployment - Deployment view

View Expressions

  • include * - Include all elements in scope
  • include Element1 Element2 - Include specific elements
  • exclude Element1 - Exclude specific elements
  • autolayout "lr"|"tb"|"auto" - Layout direction hint

Implied Relationships

Relationships are automatically inferred when child relationships exist:

User -> API.WebApp "Uses"
// Automatically infers: User -> API

This reduces boilerplate while maintaining clarity.

Complete Example

// Element Kinds (required)
person = kind "Person"
system = kind "System"
container = kind "Container"
component = kind "Component"
datastore = kind "Datastore"  // Alias for 'database'

// Overview
overview {
    summary "E-commerce platform architecture"
    audience "Development team"
    scope "Core shopping and payment functionality"
}

// Elements
Customer = person "Customer"
Admin = person "Administrator"

Shop = system "E-commerce Shop" {
    description "High-performance e-commerce platform"

    WebApp = container "Web Application" {
        technology "React"
        Cart = component "Shopping Cart"
        Checkout = component "Checkout Service"
    }

    API = container "API Gateway" {
        technology "Node.js"
        scale {
            min 3
            max 10
        }
        slo {
            latency {
                p95 "200ms"
                p99 "500ms"
            }
        }
    }

    DB = database "PostgreSQL Database" {
        technology "PostgreSQL 14"
    }
}

// Relationships
Customer -> Shop.WebApp "Browses"
Shop.WebApp -> Shop.API "Calls"
Shop.API -> Shop.DB "Reads/Writes"

// Requirements
R1 = requirement functional "Must support 10k concurrent users"
R2 = requirement constraint "Must use PostgreSQL"

// ADRs
ADR001 = adr "Use microservices architecture" {
    status "accepted"
    context "Need to scale different parts independently"
    decision "Adopt microservices architecture"
    consequences "Gain: Independent scaling. Trade-off: Increased complexity"
}

// Policies
SecurityPolicy = policy "Enforce TLS 1.3" {
    category "security"
    enforcement "required"
}

// Constraints and Conventions
constraints {
    "All APIs must use HTTPS"
    "Database must be encrypted at rest"
}

conventions {
    "Use RESTful API design"
    "Follow semantic versioning"
}

// Scenarios
PurchaseScenario = scenario "User purchases item" {
    step Customer -> Shop.WebApp "Adds item to cart"
    step Shop.WebApp -> Shop.API "Submits order"
    step Shop.API -> Shop.DB "Saves order"
}

// Views (optional - auto-generated if omitted)
view index {
    title "System Context"
    include *
}

view container_view of Shop {
    title "Shop Containers"
    include Shop.*
}

Key Rules

  1. Flat Syntax: All declarations are top-level, no specification {}, model {}, or views {} wrapper blocks
  2. IDs: Must be unique within their scope
  3. References: Use dot notation (e.g., System.Container)
  4. Relations: Can be defined anywhere (implied relationships are automatically inferred)
  5. Metadata: Freeform key-value pairs
  6. Descriptions: Optional string values
  7. Views: Optional — C4 views are automatically generated if not specified
  8. SLOs: Can be defined at architecture, system, or container level
  9. Scale: Can be defined at container or component level

Common Patterns

C4 Model Levels

  • Level 1 (System Context): Systems and persons
  • Level 2 (Container): Containers within systems
  • Level 3 (Component): Components within containers

Resources