Development Roadmap¶

Core Philosophy¶

apflow = Pure orchestration library + Optional framework components

Core: Zero framework dependencies, embeddable in any project
Optional: A2A/MCP servers, CLI tools, protocol adapters
Goal: Easy integration, easy extension, can coexist with competitors

Completed Features (Summary) ✅¶

Pure Python orchestration core, embeddable and framework-free
Flexible task model with dependency trees, custom fields, and priority-based execution
Pluggable extension system for executors, storage, hooks, and tools
Built-in executors: system, network (REST, WebSocket, gRPC), infrastructure (SSH, Docker), and AI/LLM (CrewAI, LiteLLM, MCP)
Unified API: A2A, MCP, JSON-RPC, with real-time streaming and protocol adapters
CLI tools for full task and config management, supporting both local and remote API modes
Robust configuration management (ConfigManager), multi-location and type-safe
Advanced features: task copy, validation, idempotency, hooks, streaming, demo mode
Comprehensive test suite (800+ tests), strict type/linting, and CI/CD integration

Recent Major Changes (from CHANGELOG) ✅¶

Task model extended: task_tree_id, origin_type, and schema migration tracking
Executor access control: environment-based filtering, API/CLI enforcement, and permission checks
Extension management refactored for better modularity and security
Improved task execution logic: priority grouping, error handling, and tree retrieval
Database schema management: simplified migration, improved reliability
CLI documentation and usability enhancements
TaskCreator now supports multiple origin types (link, copy, archive, mixed)

Development Priorities¶

Priority 1: Fluent API (TaskBuilder) ✅¶

Goal: Type-safe, chainable task creation API

Implementation:

# New file: src/apflow/core/builders.py
result = await (
    TaskBuilder(manager, "rest_executor")
    .with_name("fetch_data")
    .with_input("url", "https://api.example.com")
    .depends_on("task_auth")
    .execute()
)

Deliverables: - Type-safe builder with generics - Support for all task properties - Documentation with examples - Integration with existing TaskManager

Why: - Zero breaking changes - Immediate DX improvement - Competitive advantage over Dagster/Prefect - Foundation for future enhancements

Priority 2: CLI → API Gateway Architecture ✅¶

Goal: Enable CLI commands to access API-managed data, ensuring data consistency and supporting concurrent access patterns.

Problem Statement: - CLI currently queries database directly, causing data inconsistency when API is running - DuckDB doesn't support concurrent writes, creating conflicts between CLI and API - No support for remote API servers or multi-instance deployments

Implementation:

# New module: src/apflow/cli/api_client.py
# HTTP client for CLI to communicate with API
class APIClient:
    def __init__(self, server_url: str, auth_token: Optional[str] = None):
        self.server_url = server_url
        self.auth_token = auth_token

    async def execute_task(self, task_id: str) -> dict: ...
    async def get_task_status(self, task_id: str) -> dict: ...
    async def list_tasks(self, **filters) -> list: ...
    async def cancel_task(self, task_id: str) -> dict: ...

# ConfigManager extended with:
# - api_server_url (address, port)
# - api_auth_token (optional, for auth with running API)
# - use_local_db (bool, bypass API for direct local queries if needed)
# - api_timeout (seconds)
# - api_retry_policy (exponential backoff)

CLI Integration:

# Configure API server
apflow config set api_server_url http://localhost:8000
apflow config set api_auth_token <token>

# CLI commands automatically use API when configured
apflow tasks list  # Routes to API instead of local DB
apflow tasks execute task-123
apflow tasks cancel task-456

# Fallback behavior: if API unreachable, use local DB (configurable)
apflow tasks list --local-only  # Force local database access

Deliverables: - HTTP client layer (src/apflow/cli/api_client.py) with request/response handling - ConfigManager extension for API configuration (URL, auth, timeouts, retry policy) - CLI command layer refactored to use APIClient by default when configured - Graceful fallback to local DB if API unavailable (with warning) - Request middleware for auth token injection - Error handling for network timeouts and API errors - Documentation on API + CLI co-deployment patterns - Integration tests for CLI → API workflows

Why: - Solves data consistency problem between API and CLI (single source of truth) - Unblocks DuckDB concurrent write limitations (all writes go through API) - Foundation for all future protocol adapters (CLI, GraphQL, MQTT, WebSocket all use same HTTP layer) - Enterprise requirement (API gateway pattern for multi-instance deployments) - Prerequisite for distributed deployments and remote API servers - Enables CLI to work with centralized API without direct database access

Priority 3: Distributed Core Enablement ⭐⭐⭐¶

Goal: Multi-node/instance orchestration with centralized coordination

Problem Statement: - Current single-node limitation: only one API or CLI instance can safely write to DuckDB - No distributed task assignment across nodes - Cannot leverage multiple machines for horizontal scaling - No support for high availability and fault tolerance - Tasks must run on the same machine as TaskManager instance

For detailed design rationale and architecture decisions, see Distributed Orchestration Design.

Implementation:

Node Registry & Management (src/apflow/core/distributed/)

class NodeRegistry:
    async def register_node(
        self, 
        node_id: str, 
        capabilities: dict,  # CPU/GPU/memory/labels/executor_types
        executor_types: list[str]
    ) -> None: ...

    async def heartbeat(self, node_id: str) -> None: ...

    async def deregister_node(self, node_id: str) -> None: ...

    async def list_healthy_nodes(self) -> list[NodeInfo]: ...

@dataclass
class PlacementConstraints:
    requires_executors: list[str]  # Must have one of these
    requires_capabilities: dict  # e.g., {"gpu": "nvidia", "memory_gb": 16}
    forbidden_nodes: set[str]  # Blacklist specific nodes
    max_parallel_per_node: int = 1

Task Leasing & Idempotency (src/apflow/core/distributed/leasing.py)

class TaskLease:
    task_id: str
    node_id: str
    lease_token: str
    acquired_at: datetime
    expires_at: datetime

    async def renew(self, duration: timedelta) -> None: ...
    async def release(self) -> None: ...

@dataclass
class ExecutionIdempotency:
    idempotency_key: str  # Unique per (task_id, execution_attempt)
    result_cache: dict  # Store result to return on retry

Distributed TaskManager (src/apflow/core/distributed/manager.py)

Development Roadmap¶

Core Philosophy¶

Completed Features (Summary) ✅¶

Recent Major Changes (from CHANGELOG) ✅¶

Development Priorities¶

Priority 1: Fluent API (TaskBuilder) ✅¶

Priority 2: CLI → API Gateway Architecture ✅¶

Priority 3: Distributed Core Enablement ⭐⭐⭐¶

Priority 4: Protocol Adapter Abstraction Layer ⭐⭐⭐¶

Priority 5: GraphQL Protocol Adapter ⭐⭐⭐¶

Priority 6: MQTT Protocol Adapter ⭐⭐¶

Priority 7: Observability Hook System ⭐⭐¶

Priority 8: Workflow Patterns Library ⭐⭐¶

Priority 9: VS Code Extension ⭐¶

Priority 10: Testing Utilities ⭐¶

Priority 11: Hot Reload Development Mode ⭐¶

Priority 12: Bidirectional WebSocket Server ⭐¶

Unified Configuration Management (ConfigManager)¶

Success Metrics¶

Library-First Success Criteria¶

Developer Experience Success Criteria¶

Competitive Success Criteria¶

Implementation Status Summary¶

Package Structure Updates¶

Explicitly NOT Planned¶

Documentation Priorities¶

Core Library Documentation¶

Protocol Documentation¶

Advanced Guides¶

Competitive Positioning¶

Unique Value Proposition¶

Key Differentiators¶