Data Decompose Tested

AI Research Pipeline

Multi-skill Claude plugin for AI/ML research workflows: data diagnostics, baseline modeling, full-battery analysis (ML + DL), interpretability, manuscript drafting, LaTeX assembly, and external review. Domain experts bring the question and judgment; the pipeline structures the execution layer.

Source VeraSuperHub/ai-research-pipeline
GitHub stars ★ 3
License Open source
Bundled skills 8
Last updated May 4, 2026

This plugin is the AI/ML half of the human–AI research lab the rest of this site argues for. It decomposes the research-execution layer — diagnostics, modeling, interpretability, drafting, assembly — into eight modular sub-skills you can install as a single plugin and invoke individually or chain through the orchestration pipelines.

What this plugin makes possible

The core thesis: a domain expert who brings the research question, the data context, and the methodological judgment can offload the rest — the diagnostic checks, the model battery, the interpretability runs, the manuscript-section drafting, the LaTeX assembly, the external-review prep. That offloading is what this plugin structures.

Three families of sub-skills, all installed together:

  • Testing skills (3) — drop-in diagnostics + a baseline. You point them at a dataset; they tell you whether your assumptions hold and what the simplest model already produces. Useful as a sanity check before you commit to anything fancier.
  • Analyzing skills (3) — full model battery (ML + DL) for each data type, with interpretability layers and unified variable-importance tables. The skill does the comparison so you can read the result, not run twelve scripts.
  • Pipeline skills (2) — orchestration. application-pipeline is for “I have a research question and a dataset; turn this into a manuscript draft.” methodology-pipeline is for “I have a research direction; turn this into a benchmarked methodology paper draft.”

Who it’s for

  • PhD/MS scientists in computational biology, biostatistics, or applied ML for life sciences doing research that produces papers
  • Industry research scientists at biotech / pharma / AI-for-health companies who write internal technical reports or external publications
  • Postdocs trying to ship more papers per unit time without sacrificing methodological rigor
  • Career-switchers entering bio/pharma data science from adjacent fields, who need a structured workflow that imposes the rigor norms of the field they’re joining

What this plugin will not do

The README is explicit and the boundary is the editorial point of the whole site. The plugin cannot choose your research question, decide whether a claim is scientifically valid, judge whether its own output is correct, decide which result matters, or know when to override its own pipeline. Those are judgment-layer activities. They stay yours.

How to install

Clone the repo from VeraSuperHub/ai-research-pipeline and copy the vera-ai-research-skillset/ directory contents into your ~/.claude/skills/ folder. Or use the bundled vera-ai-research.plugin file if you’re managing skills through a plugin manager.

How it relates to the rest of the site

This plugin is one of two first-party reference implementations of the architect thesis. The other is the Statistical Research Pipeline, which covers the classical-statistics half of the same surface (study design, hypothesis tests, effect sizes, mixed models, SEM, manuscript drafting). The two plugins are designed to be compatible — same interface conventions, same manuscript-output formats — and to cover non-overlapping methodological territory. Most empirical papers in the life sciences use both halves.

Bundled sub-skills (8)

Testing — diagnostics + primary tests

vera-ai-nlp-testing

Text data diagnostics — class balance, text-length statistics, vocabulary analysis, TF-IDF + Logistic Regression baseline with bootstrapped 95% CIs.

vera-ai-structured-testing

Tabular data diagnostics — missing values, outlier detection (IQR), correlations, LightGBM baseline for both classification and regression.

vera-ai-image-testing

Image data diagnostics — class distribution, size and channel statistics, CNN from scratch (N ≥ 1000) or ResNet18 feature extractor + LogReg (N < 1000).

Analyzing — full analysis + manuscript sections

vera-ai-nlp-analyzing

Full text-modeling battery — SVM, Random Forest, LightGBM (ML); GRU, TextCNN, ALBERT (DL). Permutation, Gini, and gain importance for interpretability.

vera-ai-structured-analyzing

Full tabular battery — LogReg, SVM, RF, XGBoost, LightGBM, CatBoost (ML); MLP, TabNet, Stacking Ensemble (DL). Unified 0-100 importance scale plus TabNet attention.

vera-ai-image-analyzing

Full image battery — ResNet50, EfficientNet-B0, VGG16, DenseNet121 (transfer); ViT and ensemble (DL). GradCAM and ViT attention maps for interpretability.

Pipelines — workflow orchestration

vera-ai-application-pipeline

End-to-end orchestration: research question + dataset → literature review → parallel multi-method analysis → assembled Markdown + LaTeX manuscript draft.

vera-ai-methodology-pipeline

Methodology paper orchestration: research direction → idea discovery → implementation → benchmark experiments → external review → review-ready manuscript draft.