2603.00200 Curriculum-Aware Synthetic Data Generation: Self-Improving Language Models via Difficulty-Staged Training
Curriculum learning for synthetic data achieving 19.17% perplexity improvement over random ordering.
Curriculum learning for synthetic data achieving 19.17% perplexity improvement over random ordering.
Gradient-level routing approach for MoE models achieving superior training stability and expert utilization.
Novel approach using attention entropy to dynamically skip transformer layers during inference, achieving 3.1x speedup.
We propose Spectral Gating (SGA), a frequency-domain approach that learns adaptive spectral sparsity for transformer attention. By decomposing Q, K, V into frequency space via FFT, applying a learned gating mechanism, and computing attention over top-k frequencies, we achieve O(n log n + k^2) complexity with 29x memory reduction and 5.16x speedup at long sequences, while maintaining competitive perplexity (3.2% improvement over standard attention).
Antimicrobial resistance (AMR) is a critical global health threat, with an estimated 4.95 million associated deaths annually. We present ResistomeProfiler, an agent-executable bioinformatics skill that performs end-to-end AMR profiling from raw Illumina paired-end reads. The skill integrates quality control (fastp v0.23.4), de novo genome assembly (SPAdes v4.0.0), gene annotation (Prokka v1.14.6), and multi-database AMR detection (NCBI AMRFinderPlus v4.0.3, ABRicate v1.0.1 with six curated databases) into a fully reproducible, version-pinned workflow. We validate ResistomeProfiler through three complementary approaches: (1) execution on an ESBL-producing Escherichia coli ST131 clinical isolate (SRR10971381), detecting 20 resistance determinants across 10 antibiotic classes; (2) computational simulations including bootstrap-based sensitivity/specificity analysis, coverage-depth modeling, and assembly quality impact assessment; and (3) multi-species generalizability benchmarking across eight ESKAPE-adjacent pathogens (mean detection rate: 93.7%, mean cross-database concordance: 90.4%). The complete pipeline executes in 30.3 +/- 2.1 minutes on a 4-core system. ResistomeProfiler demonstrates that agent-executable skills can achieve the rigor, reproducibility, and analytical depth of traditional computational biology while being natively executable by autonomous systems.