Skip to content

[Bug]:Deep Research returns a success response even when a research block fails #595

Description

@bossjoker1

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Deep Research can silently downgrade a failed subtopic into an empty ResearchedBlock and still continue into report generation.

In deeptutor/agents/research/pipeline.py, _drive_queue() runs block research through asyncio.gather(..., return_exceptions=True). If one block raises, the scheduler logs the exception, marks the block failed, and replaces it with ResearchedBlock(block=block, knowledge="") instead of surfacing the failure. _run_inner() then unconditionally passes the resulting block list into _write_report(...) and returns a normal success payload.

That means a user can receive a success-shaped Deep Research result even though one of the planned research blocks produced no evidence at all. In practice, this can silently degrade the final report into a partial report while making the run look successful to the caller.

Relevant source locations:

  • deeptutor/agents/research/pipeline.py:965-985
  • deeptutor/agents/research/pipeline.py:998-1003
  • deeptutor/agents/research/pipeline.py:517-547

Steps to reproduce

From the repository root, run the following probe after installing the project dependencies normally:

PYTHONPATH=. python - <<'PY'
import asyncio
import json
import types
from unittest.mock import patch

from deeptutor.agents.research.pipeline import ResearchPipeline, ResearchedBlock, SubTopicItem
from deeptutor.core.context import UnifiedContext
from deeptutor.core.stream_bus import StreamBus


class FakeLLM:
    binding = "openai"
    model = "gpt-x"
    api_key = "k"
    base_url = "u"
    api_version = None
    extra_headers = {}
    reasoning_effort = None


class FakeRegistry:
    def build_openai_schemas(self, _names):
        return []

    def build_prompt_text(self, _names, **_kwargs):
        return "- none"

    def get(self, _name):
        return None

    def get_enabled(self, _names):
        return []


async def main() -> None:
    with patch("deeptutor.agents.research.pipeline.get_llm_config", lambda: FakeLLM()), patch(
        "deeptutor.agents.research.pipeline.get_tool_registry", lambda: FakeRegistry()
    ):
        pipeline = ResearchPipeline(language="en", runtime_config={"queue": {"max_length": 5}})

    captured = {}

    async def fake_research_block(self, *, block, queue, citations, topic, context, stream, client):
        queue.mark_researching(block.block_id)
        if block.block_id == "block_1":
            raise RuntimeError("synthetic block failure")
        queue.mark_completed(block.block_id)
        return ResearchedBlock(block=block, knowledge=f"knowledge for {block.block_id}")

    async def fake_write_report(self, *, topic, blocks, citations, stream, client):
        captured["blocks"] = [
            {
                "id": rb.block.block_id,
                "status": getattr(rb.block.status, "value", str(rb.block.status)),
                "knowledge": rb.knowledge,
            }
            for rb in blocks
        ]
        return "REPORT_OK"

    async def fake_emit(*args, **kwargs):
        return None

    pipeline._research_block = types.MethodType(fake_research_block, pipeline)
    pipeline._write_report = types.MethodType(fake_write_report, pipeline)

    with patch("deeptutor.agents.research.pipeline.emit_capability_result", fake_emit):
        result = await pipeline._run_inner(
            context=UnifiedContext(session_id="s1", user_message="research this"),
            topic="Research topic",
            image_attachments=[],
            confirmed_outline=[SubTopicItem(title="A"), SubTopicItem(title="B")],
            stream=StreamBus(),
            client=None,
        )

    print(json.dumps({"result": result, "captured_blocks": captured["blocks"]}, indent=2, default=str))


asyncio.run(main())
PY

Observed output:

{
  "result": {
    "response": "REPORT_OK",
    "output_dir": "",
    "metadata": {
      "mode": "agentic_research",
      "topic": "Research topic",
      "block_count": 2,
      "citation_count": 0
    }
  },
  "captured_blocks": [
    {
      "id": "block_1",
      "status": "failed",
      "knowledge": ""
    },
    {
      "id": "block_2",
      "status": "completed",
      "knowledge": "knowledge for block_2"
    }
  ]
}

Expected Behavior

If one research block fails, the top-level Deep Research run should not return a normal success response as if the report were complete.

Reasonable behaviors would be:

  • fail the run and surface the block failure to the caller, or
  • return an explicit partial-failure status, or
  • stop before report generation unless the partial-result behavior is intentional and clearly marked in the result envelope

What should not happen is silently converting a failed block into empty knowledge and producing a success-shaped report with missing evidence.

Related Module

Deep Research

Configuration Used

runtime_config = {"queue": {"max_length": 5}}
confirmed_outline = [SubTopicItem(title="A"), SubTopicItem(title="B")]

The repro above does not require any real model or tool backend. It patches get_llm_config, get_tool_registry, and the block/report methods only to isolate the scheduler/reporting behavior in the shipped ResearchPipeline.

Logs and screenshots

Console log from the same repro:

Block block_1 research failed: synthetic block failure

And the returned payload still reports success:

{
  "response": "REPORT_OK",
  "output_dir": "",
  "metadata": {
    "mode": "agentic_research",
    "topic": "Research topic",
    "block_count": 2,
    "citation_count": 0
  }
}

Additional Information

  • DeepTutor Version: 1.4.12
  • Operating System: Linux-5.15.0-173-generic-x86_64-with-glibc2.35
  • Python Version: 3.13.13 in the repro environment
  • Node.js Version:
  • Browser (if applicable):
  • Related Issues:
  • Source checkout used for repro: commit 30b92df

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions