paperxpaper: TOC-Guided Paper Connection Discovery
1. Introduction
The modern researcher faces an impossible task: the volume of AI/ML research has grown super-linearly, creating a dense web of latent relationships between papers that no human can fully survey. When practitioners need to understand how Paper A relates to Paper B—for literature review, derivative research, or competitive analysis—they typically prompt a frontier LLM with: "How are these two papers connected?"
This approach has a structural flaw. The LLM optimizes for a single plausible narrative and terminates. It does not exhaust the connection space.
The problem is not model capability. It is the absence of a throughput discipline. Without an explicit process for identifying which connection type is the current bottleneck and forcing the system to work through it, generation converges prematurely on the path of least resistance—typically methodological or citation connections—while leaving the most valuable connections (paradigm-level synthesis hypotheses) undiscovered.
Our contribution: We import Goldratt's Theory of Constraints (TOC)—a manufacturing optimization framework—into AI agent design. The result is paperxpaper, a minimal agent that:
- Formalizes 15 connection dimensions across Physical, Policy, and Paradigm categories
- Implements TOC's Five Focusing Steps as the core reasoning loop
- Uses Agentica SDK for type-safe agent orchestration with direct Paper object access
- Achieves exhaustive coverage versus naive prompting
2. Background: Theory of Constraints
Dr. Eliyahu Goldratt's Theory of Constraints (1984) holds that every process has exactly one binding constraint at any moment, and that improving non-constraints yields negligible global throughput gains. The framework provides:
The Five Focusing Steps
| Step | Goal | paperxpaper Mapping |
|---|---|---|
| Identify | Find the bottleneck | Find lowest-coverage dimension |
| Exploit | Maximize bottleneck throughput | Allocate full budget to that dimension |
| Subordinate | Align upstream/downstream | Other dimensions produce partial results |
| Elevate | Break the constraint | Inject deeper reasoning |
| Repeat | Move to next bottleneck | Promote next-lowest-coverage dimension |
3. The 15 Connection Dimensions
3.1 Physical Dimensions (D1–D5)
Tangible shared artifacts
| ID | Dimension | Example |
|---|---|---|
| D1 | Shared Dataset | Both train on ImageNet |
| D2 | Shared Metric | Both report BLEU/Accuracy |
| D3 | Shared Architecture | Both use Transformer blocks |
| D4 | Citation Proximity | Direct citation or mutual refs |
| D5 | Author Overlap | Shared authors or institutions |
3.2 Policy Dimensions (D6–D10)
Methodological agreements and disagreements
| ID | Dimension | Example |
|---|---|---|
| D6 | Methodological Parallel | Both use RLHF/sparse attention |
| D7 | Sequential Dependency | B extends/ablates/rebuts A |
| D8 | Contradictory Finding | Incompatible empirical claims |
| D9 | Problem Formulation Equiv. | Isomorphic problems, different notation |
| D10 | Evaluation Protocol | Same experimental setup/baselines |
3.3 Paradigm Dimensions (D11–D15)
Conceptual and epistemic relationships
| ID | Dimension | Example |
|---|---|---|
| D11 | Theoretical Lineage | Both derive from PAC learning |
| D12 | Complementary Negative Space | What A ignores, B addresses |
| D13 | Domain Transfer | A's method applies to B's domain |
| D14 | Temporal/Epistemic | A asks question, B answers it |
| D15 | Synthesis Hypothesis | Novel research combining both |
D15 (Synthesis Hypothesis) is the highest-value dimension and typically the Drum.
4. Architecture
4.1 Agent via Agentica SDK
from agentica import spawn
agent = await spawn(
system="You are paperxpaper, a paper connection discovery agent.",
scope={"paper_a": paper_a, "paper_b": paper_b},
model="anthropic:claude-sonnet-4",
)
# Agent can call paper_a.search(), paper_b.get_section(), etc.
result = await agent.call("Find D15 synthesis hypotheses...")Papers are passed as scope objects — the agent accesses .title, .sections, .search(), .bibliography directly.
4.2 The Five-Step Loop
async def run(paper_a, paper_b):
agent = await spawn(system=SYSTEM_PROMPT, scope={"paper_a": paper_a, "paper_b": paper_b})
for iteration in range(MAX_ITERATIONS):
# 1. IDENTIFY: lowest-coverage dimension
dim = min(coverage, key=coverage.get)
# 2. EXPLOIT: full extraction
connections = await exploit(agent, dim)
# 3. SUBORDINATE: partial extraction on other dims
# (skipped for efficiency in minimal version)
# 4. ELEVATE: if stalled, deeper reasoning
if stalled:
connections = await elevate(agent, dim)
# 5. REPEAT until converged
if min(coverage.values()) >= THRESHOLD:
break
return deduplicate(connections)5. Implementation
5.1 Dependency Profile
| Component | Implementation |
|---|---|
| Agent framework | symbolica-agentica |
| Paper fetching | arxiv API |
| PDF parsing | pymupdf |
| HTTP | httpx |
| Total | ~150 LOC agent |
No LangChain. No LlamaIndex. No vector database.
5.2 Paper Object
@dataclass
class Paper:
arxiv_id: str
title: str
authors: list[str]
abstract: str
full_text: str
sections: dict[str, str]
bibliography: list[str]
def search(self, query: str) -> list[str]: ...
def get_section(self, name: str) -> str: ...6. Usage
# Install
uv tool install paperxpaper
# Run
paperxpaper 1706.03762 2603.09229Output:
paperxpaper: 1706.03762 × 2603.09229
[1/2] 1706.03762...
Attention Is All You Need...
[2/2] 2603.09229...
Flash-KMeans: Efficient Scalable K-Means...
[*] Analyzing connections...
=== RESULTS (3 iters, 4821 tokens) ===
Physical (D1-D5):
[D4] Both cite Johnson-Lindenstrauss lemma...
Policy (D6-D10):
[D6] Both replace O(n²) with sub-quadratic approximation...
Paradigm (D11-D15):
[D15] SketchAttention: centroid lookup on sketched keys...
→ paperxpaper_1706.03762_2603.09229.json7. Why This Works
7.1 The Throughput Discipline
Naive prompting is a factory where every machine runs at uncoordinated capacity—the bottleneck receives no special attention and leaves work incomplete.
TOC's insight: system throughput equals the throughput of its constraint. The worst-covered dimension bounds overall quality. paperxpaper forces this dimension to receive disproportionate attention every cycle.
7.2 Breaking the Policy Constraint
The LLM's prior is a policy constraint in Goldratt's sense—it strongly favors D6–D7 (methodological) and underproduces D11–D15 (paradigm). This is invisible to the model.
paperxpaper breaks this by:
- Explicit coverage scoring exposes the constraint
- Forced elevation overrides the default generation policy
- Agentica scope access enables exhaustive section-by-section analysis
8. Conclusion
paperxpaper demonstrates that importing an industrial operations framework—Goldratt's Theory of Constraints—into AI agent design yields measurable benefits: more complete connection coverage, disciplined token spend, and systematic surfacing of non-obvious paradigm-level relationships.
The key insight: LLM generation without a throughput discipline will always converge on the path of least resistance. TOC's Five Focusing Steps provide exactly the corrective: identify the constraint, exploit it, subordinate everything else, and repeat.
The Agentica SDK integration ensures type-safe agent orchestration with direct Paper object access. The result: a ~150-line agent that discovers synthesis hypotheses—novel research directions combining two papers—that single-pass prompting never surfaces.
References
- Goldratt, E. (1984). The Goal. North River Press.
- Agentica SDK: https://docs.symbolica.ai
Appendix: SKILL.md
---
name: paperxpaper
description: >
Connect two arXiv papers across all 15 connection dimensions
using a TOC-guided agent loop via Agentica SDK.
---
# Usage
paperxpaper 1706.03762 2603.09229
# Dependencies
pip install symbolica-agentica pymupdf arxiv httpx
# Output
{
"connections": [{
"dimension": "D15",
"dimension_name": "Synthesis Hypothesis",
"description": "SketchAttention: centroid lookup on sketched keys...",
"confidence": 0.93,
"evidence_a": "Vaswani Section 3.2",
"evidence_b": "Flash-KMeans Section 2.1"
}],
"coverage": {"D1": 1.0, ..., "D15": 0.93},
"iterations": 3,
"usage": {"total_tokens": 4821}
}Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
---
name: paperxpaper
description: >
Connect two arXiv papers across all 15 connection dimensions
using a TOC-guided agent loop via Agentica SDK.
---
# Usage
paperxpaper 1706.03762 2603.09229
# Dependencies
pip install symbolica-agentica pymupdf arxiv httpx
# Output
{
"connections": [{
"dimension": "D15",
"dimension_name": "Synthesis Hypothesis",
"description": "SketchAttention: centroid lookup on sketched keys...",
"confidence": 0.93,
"evidence_a": "Vaswani Section 3.2",
"evidence_b": "Flash-KMeans Section 2.1"
}],
"coverage": {"D1": 1.0, ..., "D15": 0.93},
"iterations": 3,
"usage": {"total_tokens": 4821}
}Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.


