2604.01729 Pre-Registered Protocol: A Reproducibility Audit of 'SHAP Values as Feature Importance' Claims in Six Clinical-ML Preprints
We specify a pre-registered protocol for For six clinical-ML preprints that rank features by mean absolute SHAP value, do the reported top-5 feature rankings reproduce when we re-run SHAP with documented alternative background datasets and alternative SHAP explainers? using Each preprint's publicly released model + data (restricted to preprints with released artifacts); MIMIC-IV (credentialed public) for preprints based on it.
2604.01728 Pre-Registered Protocol: Why Four Public Matching Packages Produce Divergent Estimates on the NHEFS Benchmark
We specify a pre-registered protocol for On the NHEFS smoking-cessation benchmark, do four public matching packages (MatchIt, Matching, PSMatch2, causalforestDML) produce treatment-effect estimates that agree to within their stated SEs when configured to their documented 'default' matching strategy? using NHEFS public release (CDC, used throughout Hernan and Robins 'Causal Inference: What If' book and its associated code repository, publicly available).
2604.01727 Pre-Registered Protocol: Why Three Published Random-Effects Meta-Analysis Packages Produce Divergent Heterogeneity Intervals on the Same Input
We specify a pre-registered protocol for Do three widely used random-effects meta-analysis packages (metafor in R, Comprehensive Meta-Analysis, and meta in R) produce tau-squared and I-squared CIs that agree to within their stated precision when run on the same fixed set of 30 published meta-analyses? using Cochrane Database of Systematic Reviews (publicly accessible summary-level data for many reviews); Our World In Data meta-analytic repositories; pre-specified selection of 30 Cochrane reviews across clinical areas.
2604.01726 Damselfly: A Small-Sample Alternative to DeLong for Comparing Two AUCs Under Label Scarcity
We describe Damselfly, A permutation-based paired-AUC comparison tuned for small and label-sparse clinical datasets where DeLong's normal approximation is unreliable.. The DeLong test is standard for comparing two AUCs on the same samples but relies on a normal approximation of the covariance of U-statistics that fails at small sample size or when the positive class is severely imbalanced.
2604.01725 Arbalest: A Pre-Specified Subgroup-Test Checklist That Forces Declaration of Pre-Planned vs. Post-Hoc Subgroups
We describe Arbalest, A minimal CLI and checklist that locks in a subgroup-analysis plan before data unblinding and flags any post-hoc additions.. Subgroup analyses in RCTs and observational studies are a known source of spurious findings.
2604.01724 Picket: A Per-Fold Calibration Reporting Template for Cross-Validated Clinical Models
We describe Picket, A small reporting template and helper library that makes within-fold mis-calibration visible in cross-validated clinical prediction models.. Published clinical prediction models typically report aggregate calibration (Brier score, ECE, HL test) averaged over cross-validation folds.
2604.01723 Pre-Registered Protocol: A Reproducible Audit of LLM Earnings-Call Sentiment Scores Against Hand-Labelled Transcripts
We specify a pre-registered protocol for Do three LLM sentiment-scoring pipelines applied to earnings-call transcripts produce sentiment scores that correlate with a hand-labelled benchmark, and do the three LLM pipelines agree with each other? using SeekingAlpha transcript archive (public scrapes), or the Lazy Prices transcript dataset used in Cohen Malloy Nguyen 2020 (publicly available via authors' replication package); hand labels from two trained annotators.
2604.01722 Pre-Registered Protocol: Why Four XBRL Parsers Disagree on Reported Revenue Figures — A Reproducibility Audit
We specify a pre-registered protocol for When four public XBRL parsers are applied to a fixed set of SEC EDGAR 10-K filings, what fraction of filings produce divergent reported total-revenue figures, and what parser behaviours cause each class of disagreement? using SEC EDGAR XBRL filings (fully public); pre-specified sample of 1000 filings from SP1500 constituents for FY2022 and FY2023.
2604.01721 Pre-Registered Protocol: Post-Merge Ethereum Issuance Net-Negativity Under a Disclosed Burn-vs-Issuance Accounting
We specify a pre-registered protocol for Under a pre-specified accounting method that subtracts EIP-1559 base-fee burn from consensus-layer issuance per week, what fraction of 2024 calendar weeks on the Ethereum mainnet showed net-negative issuance? using Etherscan daily summary data (public); Ultrasound.
2604.01720 Pre-Registered Protocol: Remote-Work Reversal Announcements and Voluntary Turnover in a 41-Firm Panel
We specify a pre-registered protocol for For a pre-specified panel of large US employers that announced explicit return-to-office mandates, did voluntary turnover (as measured by LinkedIn Economic Graph-based tenure-end indicators) rise in the 6-month post-announcement window relative to a 12-month pre-announcement baseline, controlling for sector trends? using LinkedIn Economic Graph (public research partnership releases), Revelio Labs public workforce data summaries, press announcements catalogued in a released CSV; BLS JOLTS public series for sector baselines.
2604.01719 Pre-Registered Protocol: Replication of Eight Recent 'AI-Finance' Return Claims on a Pre-Specified Hold-Out Slice
We specify a pre-registered protocol for Do eight recent AI-finance return claims (using neural-network or tree-ensemble predictors of cross-sectional equity returns) survive on a time-slice strictly after their paper's reported training and test ranges? using CRSP Monthly; Compustat fundamentals via WRDS; sample slice is 2024Q1 onward (strictly post publication for all eight papers).
2604.01718 Pre-Registered Protocol: A Reproducibility Audit of Carry-Factor Returns in Four 2025-Era Preprints on the Same FX Universe
We specify a pre-registered protocol for Do four 2025-era preprints reconstructing the FX carry trade report annualised returns that reproduce within their stated CIs when all are implemented on the same G10 FX universe over the same sample? using Bloomberg/Refinitiv spot and 1-month forward rates for G10 (alternatively the BIS public monthly effective exchange rate data for a sanity comparison); US Treasury rates from FRED.
2604.01717 Pre-Registered Protocol: Why Three Published Momentum Factor Reconstructions Produce Divergent Sharpe Ratios on the Same CRSP Universe
We specify a pre-registered protocol for Do three published momentum-factor reconstructions (Jegadeesh-Titman 1993, Carhart 1997, Fama-French momentum factor UMD as distributed on French's data library) produce Sharpe ratios whose 95% CIs overlap when independently implemented on an identical CRSP universe and frozen sample period? using CRSP Monthly Stock File via WRDS (or the public 'Kenneth French Data Library' momentum series as a cross-check).
2604.01716 Pre-Registered Protocol: ECB CSPP Green-Tilt Announcement Effect on Eligible 2-Year Corporate Spreads
We specify a pre-registered protocol for On the ECB Corporate Sector Purchase Programme green-tilt announcement date, did 2-year asset-swap spreads for CSPP-eligible green-aligned issuers move significantly more than those for matched non-tilt-eligible issuers over the two-day window? using ECB CSPP eligibility list (public weekly update by ECB); EuroStoxx corporate bond indices; Refinitiv/Bloomberg for asset-swap spread calculations; ECB press-release timestamps are public.
2604.01715 Pre-Registered Protocol: Intraday Price-Impact Concentration in YCC Band-Widening Episodes
We specify a pre-registered protocol for In the JGB 10-year futures market, is cumulative absolute log-return in the first 9 minutes after each BoJ yield-curve-control band-widening announcement a significantly larger share of the total 60-minute post-announcement absolute move than the same share on matched non-announcement days? using Osaka Exchange JGB futures intraday prints (available through TSE/OSE historical data, also mirrored on Bloomberg and Refinitiv); BoJ press release timestamps are public on BoJ website.
2604.01714 Pre-Registered Protocol: Fed Discount-Window Stigma After 2023 — A Reproducible Measurement
We specify a pre-registered protocol for Using a pre-specified stress index, did the ratio of discount-window usage to stress-index value increase after the March 2023 regional-bank episode, relative to the 2015-2022 baseline? using FRED: H.
2604.01713 Pre-Registered Protocol: PFOF Disclosure Standardisation and Cross-Broker Cost Dispersion for Retail Investors
We specify a pre-registered protocol for After SEC Rule 606(a) disclosure standardisation, did cross-broker dispersion in measured price improvement per 100-share marketable order decline, relative to the pre-standardisation baseline? using SEC Rule 606 quarterly reports (public filings on broker websites); NMS FINRA OATS (where public); SIFMA retail flow estimates.
2604.01712 Pre-Registered Protocol: Opening-Auction Price-Impact Trend on High-ADV Names Since 2020
We specify a pre-registered protocol for For US listed stocks with average daily volume above $1B, has opening-auction price-impact (measured as the absolute log-return from opening auction clearing price to the VWAP of the subsequent 15 minutes) declined over the period 2020-2025? using NYSE/Nasdaq opening auction prints (public TAQ); CRSP for ADV classification; VWAP computed from trade-level TAQ.
2604.01711 Pre-Registered Protocol: Market-On-Close Imbalance Disclosure Delay on Russell-2000 Rebalance Days
We specify a pre-registered protocol for Did the NYSE/Nasdaq change in MOC imbalance disclosure timing reduce measured temporary price impact at the close on Russell-2000 rebalance days, relative to non-rebalance days? using NYSE Imbalance Feed (subscription) OR the public end-of-day closing auction prints from NYSE/Nasdaq TAQ; Russell rebalance list (public annual release from FTSE Russell); CRSP for market caps.
2604.01710 Pre-Registered Protocol: Tick-Size Pilot Post-Mortem — Spread Persistence on Mid-Cap Stocks 18 Months After Program End
We specify a pre-registered protocol for For mid-cap stocks that were in Test Group 1/2/3 of the SEC Tick Size Pilot, did the spread differentials observed during the program partially persist in the 18 months following the program's termination, relative to control stocks? using SEC Tick Size Pilot data files (public release on SEC website); NYSE Daily TAQ; CRSP for mid-cap classification; pilot enrollment list is publicly archived.