AI Project Manager — Production-Ready Agentic System
A fully agentic AI system that autonomously plans, researches, and executes project tasks. Built with LangGraph, NVIDIA NIM, and real-world integrations: Notion, Jira, and Slack.
Building an AI Project Manager: A Deep Dive into Production-Ready Agentic Systems
How I built an autonomous AI agent that plans, researches, and executes project tasks—with human oversight.
Watch the demo
The Problem: AI Assistants Don’t Finish Tasks
We’ve all been there: you ask ChatGPT to help plan a project, and it gives you brilliant advice. But then what? You still have to manually create the Notion page, file the Jira tickets, and notify your team on Slack. The AI stops at the “thinking” phase—leaving the execution to you.
What if AI could actually finish the job?
That’s exactly what I built: AI Project Manager, a production-ready system that doesn’t just plan your projects—it researches solutions, generates reports, creates tickets, and notifies your team. All with a human approval checkpoint to keep things safe.
The Architecture: Beyond Simple LLM Calls
Most AI applications follow a simple pattern: send a prompt, get a response. But complex workflows need more than that—they need state management, conditional routing, and human-in-the-loop patterns.
Enter LangGraph: State Machines for AI
graph LR
A[User Request] --> B[Task Planner]
B --> C[Research]
C --> D[Summarizer]
D --> E{Human Approval}
E -->|Approved| F[Integrator]
E -->|Rejected| G[END]
F --> H[Notion]
F --> I[Jira]
F --> J[Slack]
Instead of a single LLM call, I built a StateGraph—a directed graph where each node performs a specific function:
- Task Planner: Decomposes your request into 3-5 actionable subtasks
- Research: Searches the web for each subtask (concurrently!)
- Summarizer: Generates a structured markdown report
- Human Checkpoint: Pauses for your approval
- Integrator: Creates Notion pages, Jira tickets, Slack notifications
Here’s how the graph definition looks in code (src/agent/graph.py):
def build_graph(checkpointer=None) -> StateGraph:
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("task_planner", task_planner_node)
workflow.add_node("research", research_node)
workflow.add_node("summarizer", summarizer_node)
workflow.add_node("integrator", integrator_node)
workflow.add_node("error_handler", error_handler_node)
# Define flow with conditional edges
workflow.set_entry_point("task_planner")
workflow.add_conditional_edges("task_planner", _should_continue, ...)
workflow.add_conditional_edges("research", _should_summarize, ...)
workflow.add_conditional_edges("summarizer", _check_approval, ...)
workflow.add_conditional_edges("error_handler", _should_retry, ...)
# Compile with checkpointing for human-in-the-loop
return workflow.compile(
checkpointer=checkpointer,
interrupt_before=["integrator"] # Pause here!
)
Key insight: The interrupt_before=["integrator"] line is what enables human oversight. The workflow pauses before taking any real-world action.
The State: Your Workflow’s Memory
Every workflow needs to track what’s happening. LangGraph uses a TypedDict to define state that flows through all nodes:
class AgentState(TypedDict):
run_id: str
task: str
subtasks: list[str]
research_results: dict[str, str]
final_report: str
approved: bool
notion_page_url: str
jira_issue_keys: list[str]
slack_sent: bool
error: str | None
retry_count: int
Each node receives this state, modifies it, and returns updates. The state persists across async operations and API calls—crucial for long-running workflows.
LLM Flexibility: Why I Switched from OpenAI to NVIDIA NIM (and Ollama)
Here’s something most tutorials don’t cover: vendor lock-in is real. I originally built this with OpenAI, but API costs add up fast when you’re iterating on prompts.
LLM Flexibility: Why I Support Multiple Providers
Here’s something most tutorials don’t cover: vendor lock-in is real. I originally built this with OpenAI in mind, but API costs add up fast when you’re iterating on prompts. The solution? A factory pattern for LLM providers:
class LLMProvider:
@staticmethod
def get_llm():
provider = config.settings.MODEL_PROVIDER.lower()
if provider == "nvidia":
return ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
elif provider == "ollama":
return ChatOpenAI(
model="mixtral",
openai_api_base="http://localhost:11434/v1",
openai_api_key="not-needed"
)
elif provider == "openai":
return ChatOpenAI(model="gpt-4o")
Three options, same interface:
- Ollama (local): Zero API costs, runs on your GPU, completely private
- NVIDIA NIM (cloud): Pay-per-use, auto-scaling, no maintenance
- OpenAI: The familiar fallback
The best part? Switching providers is just one environment variable:
MODEL_PROVIDER=ollama # Development, free
MODEL_PROVIDER=nvidia # Production, scalable
MODEL_PROVIDER=openai # Fallback option
Human-in-the-Loop: The Safety Net
AI agents can do impressive things—and dangerously wrong things. That’s why the checkpoint pattern is essential.
Here’s how it works:
- The workflow runs through planning, research, and summarization
- It pauses at the integrator node (because of
interrupt_before=["integrator"]) - The state is saved to SQLite (yes, you can inspect it!)
- A human reviews the generated report via API
- If approved, the workflow resumes and creates tickets
The API exposes this as a simple flow:
# Start a task
curl -X POST http://localhost:8000/runs \
-d '{"task": "Plan authentication system"}'
# Get run_id from response, then approve
curl -X POST http://localhost:8000/runs/{run_id}/approve \
-d '{"approved": true}'
Real-Time Updates: SSE Streaming
Nobody likes staring at a loading spinner. That’s why I built Server-Sent Events streaming:
@app.get("/runs/{run_id}/stream")
async def stream_run(run_id: str):
async def event_generator():
async for chunk in graph.astream(None, config):
for node_name, node_state in chunk.items():
yield f"data: {json.dumps({node_name: node_state})}\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
Now clients can watch progress in real-time:
data: {"task_planner": {"subtasks": ["Research OAuth providers", ...]}}
data: {"research": {"research_results": {...}}}
data: {"summarizer": {"final_report": "# Project Plan..."}}
Error Handling: Because LLMs Fail
Here’s the uncomfortable truth: LLM calls fail. Rate limits, timeout errors, hallucinated JSON—you name it.
I built retry logic with exponential backoff:
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=30),
retry=retry_if_exception_type((Exception,))
)
async def _call_llm_with_retry(prompt: str) -> str:
llm = _get_llm()
return await llm.ainvoke([HumanMessage(content=prompt)])
And an error handler node that decides whether to retry or give up:
async def error_handler_node(state: AgentState) -> dict:
if state["retry_count"] < MAX_RETRIES:
return {"retry_count": state["retry_count"] + 1}
# Max retries exceeded, workflow ends
return {}
Observability: You Can’t Fix What You Can’t See
Production systems need monitoring. I integrated three layers:
1. Prometheus Metrics
runs_total = Counter("runs_total", "Total runs", ["status"])
run_latency_seconds = Histogram("run_latency_seconds", "Run duration")
active_runs = Gauge("active_runs", "Currently running tasks")
integration_calls_total = Counter("integration_calls_total", ...)
tavily_api_calls_total = Counter("tavily_api_calls_total", ...)
sources_per_task = Histogram("sources_per_task", ...)
hitl_decisions_total = Counter("hitl_decisions_total", ...)
2. LangSmith Tracing
Every LLM call is traced: prompts, responses, token usage, latency. Debug failed runs visually.
3. Structured Logging
logger.info(f"[{run_id}] Starting task_planner_node")
logger.error(f"[{run_id}] task_planner_node failed: {e}")
Every log includes run_id—essential when multiple workflows run concurrently.
Integration: Where AI Meets Reality
The real power is in integrations. After approval, the agent:
- Creates a Notion page with the full report
- Files Jira tickets for each subtask
- Sends a Slack notification with links
All concurrently, using Python’s asyncio:
async def integrator_node(state: AgentState) -> dict:
results = await asyncio.gather(
notion_client.create_page(title, report),
jira_client.create_issues(subtasks),
slack_client.send_notification(task, url, keys),
return_exceptions=True # Don't fail if one integration fails
)
What I Learned
-
Agentic patterns are the future. Simple LLM calls can’t handle complex workflows—you need state machines.
-
Human-in-the-loop isn’t optional. AI will make mistakes; checkpoints let you catch them.
-
Provider flexibility matters. Locking into one LLM vendor is a strategic risk.
-
Observability is non-negotiable. You can’t debug what you can’t see.
-
Async Python is powerful. Concurrent research and integrations make the system feel fast.
Try It Yourself
The code is open source. Clone it, run it locally with Ollama (no API key needed):
git clone https://github.com/yourusername/ai-project-manager.git
cd ai-project-manager
poetry install
# Set up Ollama
ollama pull mistral
ollama serve
# Run the API
MODEL_PROVIDER=ollama poetry run uvicorn src.api.main:app --reload
Open http://localhost:8000/docs and try the interactive API.
What’s Next?
I’m exploring:
- RAG integration for domain-specific knowledge
- More integrations: GitHub Issues, Linear, Asana
- Web UI for non-technical users
- Agent customization: Let users define their own workflows
The future of AI isn’t just chatbots—it’s agents that actually do things. This project is a step in that direction.
Have questions? Open an issue on GitHub or reach out. I’d love to hear what you’re building.