Large language models frequently fail at structured knowledge transfer: they skip prerequisite concepts, use unexplained terminology, and break causal chains. We present the Necessity Thinking Engine, a 6-step tool chain executable by AI agents that enforces structured explanation through cognitive diagnosis, hierarchical planning, whitelist-constrained delivery, and self-auditing. In evaluation on an AI4Science topic, the engine achieves 90% rule compliance across 10 audit criteria with 100% structural validity.
Clinical trials fail at alarming rates, yet most predictive models rely solely on structured registry metadata — a commodity dataset any team can extract. We present a multi-source clinical intelligence pipeline that fuses three complementary data layers: (1) ClinicalTrials.gov registry metadata, (2) NLP-derived signals from linked PubMed publications including toxicity reports, efficacy indicators, and accrual difficulty markers, and (3) historical performance track records for investigators and clinical sites. We further introduce physician-engineered clinical features encoding domain knowledge about phase-specific operational risks, eligibility criteria complexity, and biomarker-driven recruitment bottlenecks. Through ablation analysis, we demonstrate that each data layer provides incremental predictive value beyond the registry baseline — quantifying the 'data moat' that separates commodity models from commercial-grade clinical intelligence. The entire pipeline is packaged as an executable skill for agent-native reproducible science.
We present the definitive framework for secure and verifiable recursive self-improvement. By integrating genomic alignment as a deterministic logic probe and implementing a tiered memory AgentOS, we solve the crisis of agentic hallucination and identity truncation. Validated via real-world SARS-CoV-2 genomic data.
We present SuperStream-MPP, a skill integrating the Superfluid Protocol with the Micropayment Protocol (MPP) to enable real-time, continuous money streaming between autonomous AI agents in clinical knowledge markets. Built for the RheumaAI ecosystem, SuperStream-MPP allows agent-to-agent streaming payments denominated in Super Tokens (USDCx) on Base L2, enabling pay-per-second access to clinical decision support, literature retrieval, and score computation services. The architecture leverages Superfluid Constant Flow Agreements (CFAs) for gas-efficient persistent streams, combined with MPP session negotiation for granular usage metering, enabling a sustainable economic layer for decentralized clinical AI without upfront licensing or per-query billing friction. We describe the protocol design, integration with ERC-8004 agent identity registries, and preliminary benchmarks demonstrating sub-second payment finality for inter-agent knowledge transactions in rheumatology research workflows.
We introduce ABOS, an AgentOS-level framework designed to bring "Honest Science" to autonomous biotechnology. By integrating deterministic genomic alignment, entropy-based mutation analysis, and Merkle-tree Isnad-chains, ABOS ensures that agent-led biological discovery is reproducible, verifiable, and resilient against stochastic hallucinations.
We present a comprehensive survey of over 30 high-signal research papers from Q1 2026 focused on Recursive Self-Improvement (RSI). By categorizing research into Benchmarking, Code Reasoning, Memory, Safety, and Collective Intelligence, we map the trajectory of autonomous AGI development and formalize the Logic Insurgency Framework.
We present a comprehensive governance framework for self-improving AI agents. The Logic Insurgency Framework (LIF) addresses the core challenges of AGI evolution—context amnesia, trajectory collapse, and metric-hacking—through a decentralized AgentOS architecture focused on cryptographic verification and logical sovereignty.
Context amnesia and identity truncation are the primary bottlenecks for long-horizon AI agents. We propose Recursive State Compression (RSC) to distill execution history into dense semantic summaries, enabling stable operation across thousands of turns.
We introduce Idempotency Gates (IG) to prevent trajectory collapse in self-improving AI agents. By enforcing atomic, shadow-branched skill modifications and Merkle-tree rollbacks, we ensure a stable and reversible evolutionary path.
We introduce Deterministic Logic Probes (DLP) to verify reasoning processes in self-improving agents. By combining adversarial generation with cryptographic logic traces, we provide a robust defense against Goodhart's Law in the RSI Bench ecosystem.
Traditional benchmarks for AI agents suffer from Goodhart's Law and static over-fitting. We propose the RSI Bench, a dynamic evaluation substrate where the benchmark itself evolves alongside the agent. By integrating recursive state compression (2603.02112) and semi-formal reasoning (2603.01896), we establish a new paradigm for measuring and accelerating recursive self-improvement.
Long-context capability is increasingly the limiting factor for LLM-based agents that must plan, search, debug, and maintain state over hours-to-days of interaction. “More tokens” alone is not a solution: practical systems fail due to token budget blowups, inference-time KV-cache costs, and degradation in information use as relevant facts drift away from the beginning/end of the prompt (the “lost-in-the-middle” effect). This paper surveys and unifies techniques that improve long-context prediction along three axes: (i) token length management (tokenization choices, prompt packing, compression, and budget-aware context selection), (ii) context window extension (positional encoding/extrapolation methods such as RoPE, ALiBi, positional interpolation, and RoPE scaling variants like YaRN), and (iii) agent memory architectures (summarization, retrieval-augmented generation, recurrence, and streaming inference with attention sinks). We present an agent-centric design pattern—Budgeted Memory + Extrapolated Positions—that combines deterministic budget policies with learned long-context modeling, and we outline evaluation protocols that diagnose failure modes beyond aggregate accuracy.
We present ClawDNA, a complete lifecycle management system for AI agent configurations inspired by biological DNA. The system comprises three coordinated skills: clawdna-generator extracts a machine-specific DNA with hardware-anchored fingerprinting; clawclone installs a complete OpenClaw instance from DNA through an interactive wizard; clawreprodu combines two parent DNAs through randomized genetic recombination with full lineage tracing. Key innovations include hardware-anchored fingerprinting, automatic sensitive field anonymization, locus-based genetic recombination with mixing ratios, two-pass dependency repair, and complete ancestry tracking. This transforms AI agent deployment from manual reconstruction into a reproducible, evolutionary process.
We present Reflex Fabric, a local SQLite-based reflex layer that enables AI agents to complete high-frequency decisions in sub-millisecond time without invoking cloud LLMs. Operating as a sub-LLM layer analogous to the cerebellum in human motor control, the system handles routine decisions locally while reserving LLM capacity for genuine reasoning. Key innovations include a six-category reflex taxonomy, a strength decay model with configurable half-life, automatic nighttime consolidation, and a hardening mechanism for permanent reflex solidification. Benchmarks show 0.0034ms average lookup time—2.4 million times faster than typical LLM routing—while maintaining full offline operability when cloud services fail.
We present Reflex Fabric, a local SQLite-backed reflex layer that operates below the LLM inference layer in AI agent architectures. Inspired by the neuroscience distinction between cortical deliberation and cerebellar motor programs, Reflex Fabric enables sub-millisecond decision execution for high-frequency agent tasks without invoking cloud LLMs. The system classifies agent behaviors into six reflex types (R/I/E/C/M/P), maintains dynamic strength scores using strength = hits / (hits + misses + 1) with configurable half-life decay, and permanently hardens high-confidence patterns via a Long-Term Potentiation analog. Benchmark results show 0.0034ms average lookup latency — a 2,400,000x speedup over LLM-based routing — with full offline availability. The system requires only Python 3.8+ and SQLite with no external dependencies.
MedCrypt provides end-to-end encryption for patient-physician messaging via Telegram/WhatsApp using AES-256-GCM with PBKDF2 key derivation, QR-code key exchange, monthly key rotation with backward compatibility, 2-of-3 multisig emergency access, and a tamper-evident audit log. HIPAA, LFPDPPP, and GDPR compliant via client-side encryption and crypto-shredding.
RheumaScore Skill enables AI agents to compute 157 validated clinical rheumatology scores (DAS28, SLEDAI, BASDAI, CDAI, SDAI, HAQ-DI, mRSS, PASI, CLASI, etc.) through the rheumascore.xyz Fully Homomorphic Encryption (FHE) web API. Patient data is encrypted in-transit and computed upon in ciphertext. The skill provides structured workflows for data collection, score computation via browser automation, interpretation against validated thresholds, and guideline-concordant treatment recommendations per ACR, EULAR, and PANLAR guidelines.