The Research Agent: notes from building a 5-persona LangGraph thing

A few weekends ago I had 14 tabs open about Hopfield networks for combinatorial optimization and hadn't finished any of them. That's a recurring pattern. I read three articles, get pulled into something else, lose track of the first one, start over the next Sunday. By Sunday night I have nothing to show for it except worse posture.

So I built Research Agent. The repo is at github.com/RobertoDeLaCamara/Research-Agent. It's a LangGraph orchestration over five sub-agents, ChromaDB for local embeddings, Tavily for web search, and a small Streamlit front-end. 37 tests, all green on my machine. They'll be green on yours too if you have Ollama running locally or an OpenAI key handy.

This is roughly how it ended up looking the way it does, including the parts I'd undo.

Why LangGraph

I wanted a state machine, not a chain. Chains hide control flow inside the prompt, which is fine until something misbehaves and you can't tell which step caused it. LangGraph makes the topology explicit: each node is a step, edges are transitions, and the state is just a Python dict you can print at any node.

That last bit mattered more than I expected. The first time the agent gave me a confidently wrong answer, I dropped a print into the retrieval node and saw that the vector search had returned three irrelevant chunks because my similarity threshold was too loose. Without state-as-data I'd have spent the evening grepping logs.

Shape of the graph:

Query -> Router -> [some subset of 5 personas in parallel] -> Synthesizer -> Response

Router decides which personas to call. Synthesizer fuses their outputs.

The five personas

Calling them "personas" is mostly a naming convenience. Under the hood they're functions in a state machine, each with its own system prompt and retrieval policy.

Academic prefers ChromaDB and goes after papers and citations. Practitioner looks for working code and how-to material. Critic actively hunts for limitations and counterarguments. Scout is a plain Tavily call and only fires when the local store doesn't have enough. Synthesizer reads what the others produced and writes the final answer.

Not everything runs every time. "How do I install X" doesn't need Academic. "What are the theoretical foundations of Y" doesn't need Practitioner. Router does that gating with a small classifier prompt that, judging by the eval set I bothered to build, is right about 85% of the time.

Critic is the one I keep coming back to. For years I'd been frustrated by single-agent setups where you ask "also consider the limitations" and get a half-hearted "well, it might be slow." Letting Critic run as its own pass, with no goal other than poking holes, produced visibly different output. The other personas would say "X works because of Y" and Critic would come back with "X breaks when the input is empty, when the store is cold, or when the embedding model differs between writer and reader." Hard to extract that from a single prompt.

Retrieval

Chroma sits on disk in a folder I don't back up because I can rebuild it in five minutes. Documents I drop into data/ get chunked and embedded with nomic-embed-text via Ollama. Persona retrieval queries Chroma first; if cosine similarity stays below 0.7 across the top-k, the persona falls back to Tavily.

Tavily was a small win. It's a search API built for LLM consumption: clean, structured results instead of HTML soup. Worth the few dollars a month for the scraping logic I didn't have to write.

What I'd undo

Latency. Five sub-agents with their own prompts and retrieval calls add up to 30 to 40 seconds end-to-end on Ollama, less on OpenAI. Parallelizing the independent personas with LangGraph's parallel node execution helped but didn't fix it. The system still doesn't feel snappy and probably never will at five personas.

Context-window pressure is the second thing. When five personas each retrieve five or six chunks and Synthesizer tries to merge everything, the prompt gets long fast. I added a relevance filter before synthesis (cosine against the query, drop the bottom half), but it's blunt. The smarter version would weight by persona confidence too. Haven't done that.

The Ollama versus OpenAI gap is bigger than I'd like. Local llama3 works for development. For real research sessions I switch to OpenAI because the synthesis quality is just better and the latency penalty stings less when answers are shorter. Closing that gap is on the list, but it's mostly a function of local models getting better.

Running it

streamlit run src/app.py

Or with Docker, which is what I do most days because the bind mounts give me live reload:

docker compose up -d

Tests:

python -m pytest tests/ -v

The test suite is the part I'm most proud of, more than the agent itself. I wrote it before most of the implementation. That forced the architecture to be modular in ways it probably wouldn't have been if I'd shipped first and tested later. Synthesizer would still be a 200-line god-function.

The code is at github.com/RobertoDeLaCamara/Research-Agent. Issues welcome, especially from anyone who has a real fix for the latency problem. I haven't found one.

Live demo: Research Agent - a Hugging Face Space by Bobcamgardo

The Research Agent: notes from building a 5-persona LangGraph thing

Why LangGraph

The five personas

Retrieval

What I'd undo

Running it

Comments

More from this blog

Research Agent as an MCP Server in Claude Code: Full Integration

The variance test that flipped my local LLM ranking

Always Building - Hello, World

Command Palette

Why LangGraph

The five personas

Retrieval

What I'd undo

Running it

Comments

More from this blog