Lesson 2: Data Flow Diagrams

Learning Goals

Create DFD-style data flows in Sruja
Model data lineage and transformations
Document ETL and analytics pipelines

What Are Data Flow Diagrams?

Data Flow Diagrams (DFDs) show how data moves through a system, including:

Where data originates
Where it's stored
How it's transformed
Where it ultimately goes

DFD Elements in Sruja

`flow` for Data-Oriented Flows

OrderDataFlow = flow "Order Data Processing" {
  Customer -> Shop.WebApp "Order form data"
  Shop.WebApp -> Shop.API "Order JSON"
  Shop.API -> Shop.Database "Order record"
  Shop.Database -> Analytics "Order event"
  Analytics -> Dashboard "Aggregated metrics"
}

DFD Patterns

Pattern 1: ETL Pipeline

ETLPipeline = flow "Data ETL Pipeline" {
  SourceSystem -> DataCollector "Raw data"
  DataCollector -> MessageQueue "Data events"
  MessageQueue -> DataProcessor "Consumes data"
  DataProcessor -> DataWarehouse "Transformed data"
  DataWarehouse -> ReportingEngine "Query results"
  ReportingEngine -> Dashboard "Visualizations"
}

Pattern 2: Event Sourcing

EventFlow = flow "Event Sourcing Pipeline" {
  API -> EventStore "Persist events"
  EventStore -> ProjectorA "Project to read model A"
  EventStore -> ProjectorB "Project to read model B"
  ProjectorA -> ReadDatabaseA "Read model A"
  ProjectorB -> ReadDatabaseB "Read model B"
}

Pattern 3: Analytics Pipeline

AnalyticsFlow = flow "User Analytics Pipeline" {
  UserApp -> TrackingService "User actions"
  TrackingService -> EventStream "Raw events"
  EventStream -> BatchProcessor "Daily batch"
  BatchProcessor -> DataWarehouse "Aggregated data"
  DataWarehouse -> ReportingTool "Analytics queries"
  ReportingTool -> BusinessTeam "Insights"
}

Pattern 4: Real-Time Processing

RealTimeFlow = flow "Real-Time Fraud Detection" {
  TransactionAPI -> IngestService "Transaction data"
  IngestService -> KafkaStream "Event stream"
  KafkaStream -> FraudDetectionService "Consume events"
  FraudDetectionService -> AlertService "Fraud alerts"
  AlertService -> SecurityTeam "Notifications"
}

Documenting Data Transformations

Use Relationship Labels

DataFlow = flow "Data Transformation" {
  RawSource -> ETLService "Raw CSV data"
  ETLService -> CleanedData "Validated, normalized data"
  CleanedData -> Aggregator "Aggregated metrics"
  Aggregator -> DataWarehouse "Hourly aggregations"
}

Add Metadata for Details

ETLService = container "ETL Service" {
  metadata {
    transformations [
      "Remove invalid records",
      "Normalize phone numbers",
      "Standardize dates"
    ]
    output_format "JSON"
    output_schema "v2"
  }
}

Complete DFD Example

import { * } from 'sruja.ai/stdlib'

Customer = person "Customer"

Shop = system "Shop" {
  WebApp = container "Web Application"
  API = container "API Service"
  Database = database "PostgreSQL"
}

Analytics = system "Analytics Platform" {
  Ingestion = container "Data Ingestion"
  Processing = container "Data Processing"
  Warehouse = database "Data Warehouse"
  Reporting = container "Reporting Engine"
}

Dashboard = system "Analytics Dashboard" {
  UI = container "Dashboard UI"
}

// Data flow: Order processing and analytics
OrderAnalyticsFlow = flow "Order Analytics Pipeline" {
  Customer -> Shop.WebApp "Submits order"
  Shop.WebApp -> Shop.API "Order data"
  Shop.API -> Shop.Database "Persist order"

  // Real-time data capture
  Shop.API -> Analytics.Ingestion "Order event"
  Analytics.Ingestion -> Analytics.Processing "Validates and enriches"
  Analytics.Processing -> Analytics.Warehouse "Stores aggregated data"

  // Query and visualization
  Dashboard.UI -> Analytics.Reporting "Query metrics"
  Analytics.Reporting -> Analytics.Warehouse "Fetch data"
  Analytics.Reporting -> Dashboard.UI "Return results"
}

view index {
  include *
}

Data Lineage Tracing

Forward Tracing

// Where does this data go?
OrderFlow = flow "Order Data Lineage" {
  OrderAPI -> Database "Save order"
  Database -> ReplicationService "Replicate to secondary"
  ReplicationService -> AnalyticsDB "Stream to analytics"
  AnalyticsDB -> ReportGenerator "Generate reports"
}

Backward Tracing

// Where does this data come from?
ReportFlow = flow "Report Data Source" {
  UserActivityReport <- AnalyticsDB "Aggregated data"
  AnalyticsDB <- EventStream "Raw events"
  EventStream <- UserApp "User actions"
}

Error Handling in Flows

Document Error Paths

OrderFlow = scenario "Order Processing with Errors" {
  Customer -> Shop.API "Submit order"
  Shop.API -> PaymentGateway "Process payment"

  // Success path
  PaymentGateway -> Shop.API "Payment success"
  Shop.API -> Shop.Database "Save order"

  // Error path
  PaymentGateway -> Shop.API "Payment failed"
  Shop.API -> Shop.WebApp "Return error"
  Shop.WebApp -> Customer "Show error message"
}

Retry Logic

Shop.API = container "API Service" {
  metadata {
    retry_policy {
      max_attempts 3
      backoff "exponential"
      initial_delay "1s"
    }
  }
}

Performance Considerations

Document Latency Expectations

PaymentGateway = system "Payment Gateway" {
  metadata {
    expected_latency "500ms"
    timeout "5s"
  }
}

OrderFlow = scenario "Order Processing" {
  Customer -> Shop.WebApp "Submit order" [user_interaction]
  Shop.WebApp -> Shop.API "Send order" [internal_fast]
  Shop.API -> PaymentGateway "Process payment" [external_slower]
}

Identify Bottlenecks

ProcessingFlow = flow "File Processing Pipeline" {
  Upload -> Storage "Store file" [fast]
  Storage -> Processor "Process file" [bottleneck]
  Processor -> Notification "Notify user" [fast]
}

Exercise

Create a DFD for:

"A fitness tracking app where users log workouts. Workout data is sent to an API, stored in a database, and also sent to a real-time analytics service. The analytics service processes events and updates user dashboards. Daily, a batch job aggregates data and generates reports stored in a data warehouse for business analysis."

Create:

Main data flow
At least one data transformation
Real-time and batch processing paths

Key Takeaways

Use flow for DFDs: Data-oriented flows
Show transformations: How data changes
Document lineage: Where data comes from and goes
Handle errors: Show success and failure paths
Identify bottlenecks: Where processing slows down

Next Lesson

In Lesson 3, you'll learn how to model user journeys and behavioral scenarios.

Keyboard shortcuts

Sruja