Building an AI Project Manager: A Deep Dive into Production-Ready Agentic Systems

How I built an autonomous AI agent that plans, researches, and executes project tasks—with human oversight.

Watch the demo

The Problem: AI Assistants Don’t Finish Tasks

We’ve all been there: you ask ChatGPT to help plan a project, and it gives you brilliant advice. But then what? You still have to manually create the Notion page, file the Jira tickets, and notify your team on Slack. The AI stops at the “thinking” phase—leaving the execution to you.

What if AI could actually finish the job?

That’s exactly what I built: AI Project Manager, a production-ready system that doesn’t just plan your projects—it researches solutions, generates reports, creates tickets, and notifies your team. All with a human approval checkpoint to keep things safe.

The Architecture: Beyond Simple LLM Calls

Most AI applications follow a simple pattern: send a prompt, get a response. But complex workflows need more than that—they need state management, conditional routing, and human-in-the-loop patterns.

Enter LangGraph: State Machines for AI

graph LR
    A[User Request] --> B[Task Planner]
    B --> C[Research]
    C --> D[Summarizer]
    D --> E{Human Approval}
    E -->|Approved| F[Integrator]
    E -->|Rejected| G[END]
    F --> H[Notion]
    F --> I[Jira]
    F --> J[Slack]

Instead of a single LLM call, I built a StateGraph—a directed graph where each node performs a specific function:

Task Planner: Decomposes your request into 3-5 actionable subtasks
Research: Searches the web for each subtask (concurrently!)
Summarizer: Generates a structured markdown report
Human Checkpoint: Pauses for your approval
Integrator: Creates Notion pages, Jira tickets, Slack notifications

Here’s how the graph definition looks in code (src/agent/graph.py):

def build_graph(checkpointer=None) -> StateGraph:
    workflow = StateGraph(AgentState)

    # Add nodes
    workflow.add_node("task_planner", task_planner_node)
    workflow.add_node("research", research_node)
    workflow.add_node("summarizer", summarizer_node)
    workflow.add_node("integrator", integrator_node)
    workflow.add_node("error_handler", error_handler_node)

    # Define flow with conditional edges
    workflow.set_entry_point("task_planner")
    workflow.add_conditional_edges("task_planner", _should_continue, ...)
    workflow.add_conditional_edges("research", _should_summarize, ...)
    workflow.add_conditional_edges("summarizer", _check_approval, ...)
    workflow.add_conditional_edges("error_handler", _should_retry, ...)

    # Compile with checkpointing for human-in-the-loop
    return workflow.compile(
        checkpointer=checkpointer,
        interrupt_before=["integrator"]  # Pause here!
    )

Key insight: The interrupt_before=["integrator"] line is what enables human oversight. The workflow pauses before taking any real-world action.

The State: Your Workflow’s Memory

Every workflow needs to track what’s happening. LangGraph uses a TypedDict to define state that flows through all nodes:

class AgentState(TypedDict):
    run_id: str
    task: str
    subtasks: list[str]
    research_results: dict[str, str]
    final_report: str
    approved: bool
    notion_page_url: str
    jira_issue_keys: list[str]
    slack_sent: bool
    error: str | None
    retry_count: int

Each node receives this state, modifies it, and returns updates. The state persists across async operations and API calls—crucial for long-running workflows.

LLM Flexibility: Why I Switched from OpenAI to NVIDIA NIM (and Ollama)

Here’s something most tutorials don’t cover: vendor lock-in is real. I originally built this with OpenAI, but API costs add up fast when you’re iterating on prompts.

LLM Flexibility: Why I Support Multiple Providers

Here’s something most tutorials don’t cover: vendor lock-in is real. I originally built this with OpenAI in mind, but API costs add up fast when you’re iterating on prompts. The solution? A factory pattern for LLM providers:

class LLMProvider:
    @staticmethod
    def get_llm():
        provider = config.settings.MODEL_PROVIDER.lower()
        if provider == "nvidia":
            return ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
        elif provider == "ollama":
            return ChatOpenAI(
                model="mixtral",
                openai_api_base="http://localhost:11434/v1",
                openai_api_key="not-needed"
            )
        elif provider == "openai":
            return ChatOpenAI(model="gpt-4o")

Three options, same interface:

Ollama (local): Zero API costs, runs on your GPU, completely private
NVIDIA NIM (cloud): Pay-per-use, auto-scaling, no maintenance
OpenAI: The familiar fallback

The best part? Switching providers is just one environment variable:

MODEL_PROVIDER=ollama  # Development, free
MODEL_PROVIDER=nvidia  # Production, scalable
MODEL_PROVIDER=openai  # Fallback option

Human-in-the-Loop: The Safety Net

AI agents can do impressive things—and dangerously wrong things. That’s why the checkpoint pattern is essential.

Here’s how it works:

The workflow runs through planning, research, and summarization
It pauses at the integrator node (because of interrupt_before=["integrator"])
The state is saved to SQLite (yes, you can inspect it!)
A human reviews the generated report via API
If approved, the workflow resumes and creates tickets

The API exposes this as a simple flow:

# Start a task
curl -X POST http://localhost:8000/runs \
  -d '{"task": "Plan authentication system"}'

# Get run_id from response, then approve
curl -X POST http://localhost:8000/runs/{run_id}/approve \
  -d '{"approved": true}'

Real-Time Updates: SSE Streaming

Nobody likes staring at a loading spinner. That’s why I built Server-Sent Events streaming:

@app.get("/runs/{run_id}/stream")
async def stream_run(run_id: str):
    async def event_generator():
        async for chunk in graph.astream(None, config):
            for node_name, node_state in chunk.items():
                yield f"data: {json.dumps({node_name: node_state})}\n\n"
    
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )

Now clients can watch progress in real-time:

data: {"task_planner": {"subtasks": ["Research OAuth providers", ...]}}
data: {"research": {"research_results": {...}}}
data: {"summarizer": {"final_report": "# Project Plan..."}}

Error Handling: Because LLMs Fail

Here’s the uncomfortable truth: LLM calls fail. Rate limits, timeout errors, hallucinated JSON—you name it.

I built retry logic with exponential backoff:

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    retry=retry_if_exception_type((Exception,))
)
async def _call_llm_with_retry(prompt: str) -> str:
    llm = _get_llm()
    return await llm.ainvoke([HumanMessage(content=prompt)])

And an error handler node that decides whether to retry or give up:

async def error_handler_node(state: AgentState) -> dict:
    if state["retry_count"] < MAX_RETRIES:
        return {"retry_count": state["retry_count"] + 1}
    # Max retries exceeded, workflow ends
    return {}

Observability: You Can’t Fix What You Can’t See

Production systems need monitoring. I integrated three layers:

1. Prometheus Metrics

runs_total = Counter("runs_total", "Total runs", ["status"])
run_latency_seconds = Histogram("run_latency_seconds", "Run duration")
active_runs = Gauge("active_runs", "Currently running tasks")
integration_calls_total = Counter("integration_calls_total", ...)
tavily_api_calls_total = Counter("tavily_api_calls_total", ...)
sources_per_task = Histogram("sources_per_task", ...)
hitl_decisions_total = Counter("hitl_decisions_total", ...)

2. LangSmith Tracing

Every LLM call is traced: prompts, responses, token usage, latency. Debug failed runs visually.

3. Structured Logging

logger.info(f"[{run_id}] Starting task_planner_node")
logger.error(f"[{run_id}] task_planner_node failed: {e}")

Every log includes run_id—essential when multiple workflows run concurrently.

Integration: Where AI Meets Reality

The real power is in integrations. After approval, the agent:

Creates a Notion page with the full report
Files Jira tickets for each subtask
Sends a Slack notification with links

All concurrently, using Python’s asyncio:

async def integrator_node(state: AgentState) -> dict:
    results = await asyncio.gather(
        notion_client.create_page(title, report),
        jira_client.create_issues(subtasks),
        slack_client.send_notification(task, url, keys),
        return_exceptions=True  # Don't fail if one integration fails
    )

What I Learned

Agentic patterns are the future. Simple LLM calls can’t handle complex workflows—you need state machines.
Human-in-the-loop isn’t optional. AI will make mistakes; checkpoints let you catch them.
Provider flexibility matters. Locking into one LLM vendor is a strategic risk.
Observability is non-negotiable. You can’t debug what you can’t see.
Async Python is powerful. Concurrent research and integrations make the system feel fast.

Try It Yourself

The code is open source. Clone it, run it locally with Ollama (no API key needed):

git clone https://github.com/yourusername/ai-project-manager.git
cd ai-project-manager
poetry install

# Set up Ollama
ollama pull mistral
ollama serve

# Run the API
MODEL_PROVIDER=ollama poetry run uvicorn src.api.main:app --reload

Open http://localhost:8000/docs and try the interactive API.

What’s Next?

I’m exploring:

RAG integration for domain-specific knowledge
More integrations: GitHub Issues, Linear, Asana
Web UI for non-technical users
Agent customization: Let users define their own workflows

The future of AI isn’t just chatbots—it’s agents that actually do things. This project is a step in that direction.

Have questions? Open an issue on GitHub or reach out. I’d love to hear what you’re building.

AI Project Manager — Production-Ready Agentic System