Do you need to file an issue?
Describe the bug
Deep Research can silently downgrade a failed subtopic into an empty ResearchedBlock and still continue into report generation.
In deeptutor/agents/research/pipeline.py, _drive_queue() runs block research through asyncio.gather(..., return_exceptions=True). If one block raises, the scheduler logs the exception, marks the block failed, and replaces it with ResearchedBlock(block=block, knowledge="") instead of surfacing the failure. _run_inner() then unconditionally passes the resulting block list into _write_report(...) and returns a normal success payload.
That means a user can receive a success-shaped Deep Research result even though one of the planned research blocks produced no evidence at all. In practice, this can silently degrade the final report into a partial report while making the run look successful to the caller.
Relevant source locations:
deeptutor/agents/research/pipeline.py:965-985
deeptutor/agents/research/pipeline.py:998-1003
deeptutor/agents/research/pipeline.py:517-547
Steps to reproduce
From the repository root, run the following probe after installing the project dependencies normally:
PYTHONPATH=. python - <<'PY'
import asyncio
import json
import types
from unittest.mock import patch
from deeptutor.agents.research.pipeline import ResearchPipeline, ResearchedBlock, SubTopicItem
from deeptutor.core.context import UnifiedContext
from deeptutor.core.stream_bus import StreamBus
class FakeLLM:
binding = "openai"
model = "gpt-x"
api_key = "k"
base_url = "u"
api_version = None
extra_headers = {}
reasoning_effort = None
class FakeRegistry:
def build_openai_schemas(self, _names):
return []
def build_prompt_text(self, _names, **_kwargs):
return "- none"
def get(self, _name):
return None
def get_enabled(self, _names):
return []
async def main() -> None:
with patch("deeptutor.agents.research.pipeline.get_llm_config", lambda: FakeLLM()), patch(
"deeptutor.agents.research.pipeline.get_tool_registry", lambda: FakeRegistry()
):
pipeline = ResearchPipeline(language="en", runtime_config={"queue": {"max_length": 5}})
captured = {}
async def fake_research_block(self, *, block, queue, citations, topic, context, stream, client):
queue.mark_researching(block.block_id)
if block.block_id == "block_1":
raise RuntimeError("synthetic block failure")
queue.mark_completed(block.block_id)
return ResearchedBlock(block=block, knowledge=f"knowledge for {block.block_id}")
async def fake_write_report(self, *, topic, blocks, citations, stream, client):
captured["blocks"] = [
{
"id": rb.block.block_id,
"status": getattr(rb.block.status, "value", str(rb.block.status)),
"knowledge": rb.knowledge,
}
for rb in blocks
]
return "REPORT_OK"
async def fake_emit(*args, **kwargs):
return None
pipeline._research_block = types.MethodType(fake_research_block, pipeline)
pipeline._write_report = types.MethodType(fake_write_report, pipeline)
with patch("deeptutor.agents.research.pipeline.emit_capability_result", fake_emit):
result = await pipeline._run_inner(
context=UnifiedContext(session_id="s1", user_message="research this"),
topic="Research topic",
image_attachments=[],
confirmed_outline=[SubTopicItem(title="A"), SubTopicItem(title="B")],
stream=StreamBus(),
client=None,
)
print(json.dumps({"result": result, "captured_blocks": captured["blocks"]}, indent=2, default=str))
asyncio.run(main())
PY
Observed output:
{
"result": {
"response": "REPORT_OK",
"output_dir": "",
"metadata": {
"mode": "agentic_research",
"topic": "Research topic",
"block_count": 2,
"citation_count": 0
}
},
"captured_blocks": [
{
"id": "block_1",
"status": "failed",
"knowledge": ""
},
{
"id": "block_2",
"status": "completed",
"knowledge": "knowledge for block_2"
}
]
}
Expected Behavior
If one research block fails, the top-level Deep Research run should not return a normal success response as if the report were complete.
Reasonable behaviors would be:
- fail the run and surface the block failure to the caller, or
- return an explicit partial-failure status, or
- stop before report generation unless the partial-result behavior is intentional and clearly marked in the result envelope
What should not happen is silently converting a failed block into empty knowledge and producing a success-shaped report with missing evidence.
Related Module
Deep Research
Configuration Used
runtime_config = {"queue": {"max_length": 5}}
confirmed_outline = [SubTopicItem(title="A"), SubTopicItem(title="B")]
The repro above does not require any real model or tool backend. It patches get_llm_config, get_tool_registry, and the block/report methods only to isolate the scheduler/reporting behavior in the shipped ResearchPipeline.
Logs and screenshots
Console log from the same repro:
Block block_1 research failed: synthetic block failure
And the returned payload still reports success:
{
"response": "REPORT_OK",
"output_dir": "",
"metadata": {
"mode": "agentic_research",
"topic": "Research topic",
"block_count": 2,
"citation_count": 0
}
}
Additional Information
- DeepTutor Version:
1.4.12
- Operating System:
Linux-5.15.0-173-generic-x86_64-with-glibc2.35
- Python Version:
3.13.13 in the repro environment
- Node.js Version:
- Browser (if applicable):
- Related Issues:
- Source checkout used for repro: commit
30b92df
Do you need to file an issue?
Describe the bug
Deep Researchcan silently downgrade a failed subtopic into an emptyResearchedBlockand still continue into report generation.In
deeptutor/agents/research/pipeline.py,_drive_queue()runs block research throughasyncio.gather(..., return_exceptions=True). If one block raises, the scheduler logs the exception, marks the block failed, and replaces it withResearchedBlock(block=block, knowledge="")instead of surfacing the failure._run_inner()then unconditionally passes the resulting block list into_write_report(...)and returns a normal success payload.That means a user can receive a success-shaped Deep Research result even though one of the planned research blocks produced no evidence at all. In practice, this can silently degrade the final report into a partial report while making the run look successful to the caller.
Relevant source locations:
deeptutor/agents/research/pipeline.py:965-985deeptutor/agents/research/pipeline.py:998-1003deeptutor/agents/research/pipeline.py:517-547Steps to reproduce
From the repository root, run the following probe after installing the project dependencies normally:
Observed output:
{ "result": { "response": "REPORT_OK", "output_dir": "", "metadata": { "mode": "agentic_research", "topic": "Research topic", "block_count": 2, "citation_count": 0 } }, "captured_blocks": [ { "id": "block_1", "status": "failed", "knowledge": "" }, { "id": "block_2", "status": "completed", "knowledge": "knowledge for block_2" } ] }Expected Behavior
If one research block fails, the top-level Deep Research run should not return a normal success response as if the report were complete.
Reasonable behaviors would be:
What should not happen is silently converting a failed block into empty knowledge and producing a success-shaped report with missing evidence.
Related Module
Deep Research
Configuration Used
The repro above does not require any real model or tool backend. It patches
get_llm_config,get_tool_registry, and the block/report methods only to isolate the scheduler/reporting behavior in the shippedResearchPipeline.Logs and screenshots
Console log from the same repro:
And the returned payload still reports success:
{ "response": "REPORT_OK", "output_dir": "", "metadata": { "mode": "agentic_research", "topic": "Research topic", "block_count": 2, "citation_count": 0 } }Additional Information
1.4.12Linux-5.15.0-173-generic-x86_64-with-glibc2.353.13.13in the repro environment30b92df