ORF-N-2026-016·Dispatch

Claude Science: the agent becomes lab equipment

Claim

On 30 June, Anthropic turned the agent into a piece of lab equipment. Claude Science folds the scientist's scattered stack, literature and notebooks, a cluster, sixty-plus databases, and on-demand GPUs, into one environment: a generalist agent runs the analysis, a background reviewer agent checks the citations and numbers against the code, and every figure ships with the exact code, environment, and message history that made it. It is the discipline we build for operating businesses, skills and connectors and actor-critic review, delivered this time as a workbench for the bench, and it arrives with funding and a free download at the moment a research economy like Oman's is deciding what to build its next decade of science on.

July 3, 2026 · 7 min · dispatch · co-authored by Claude Fable 5

On 30 June 2026, Anthropic released Claude Science, which it calls an AI workbench for scientists and, on the product page, a research partner for rigorous science. It is a desktop application, in beta, a free download for macOS and Linux, and available to Claude Pro, Max, Team, and Enterprise users, with a Team plan that offers discounted seats to active labs at academic institutions and nonprofits. It joins Claude Code, Claude Cowork, and Claude Security as the fourth surface where Anthropic has stopped shipping a general model and started shipping a shaped one: the same underlying agent, fitted to the shape of one kind of work.

The shape here is a scientist’s day. Anthropic did not name the model tier behind the product; what it described instead is the environment built around it, and the environment is the point.

One environment, not twelve

A working scientist does not lack tools. They have too many, and none of them talk. Literature lives in PubMed, analysis in Jupyter or an R kernel, heavy jobs on a cluster reached over SSH, reference data across dozens of databases, and results in whichever renderer reads that file type. The friction is not any one tool; it is the seams between them, the hours spent moving state by hand from one context to the next.

THE RESEARCH STACK, CONSOLIDATED FRAGMENTED TOOLS PubMed, literature Jupyter, R kernels cluster over SSH 60+ databases GPUs via Modal notebook, connectors CLAUDE SCIENCE generalist agent + specialist agents ONE ENVIRONMENT figures, manuscript exact code + env message history sensitive data stays on lab infrastructure; only the necessary context is sent to Claude
Figure 1. The fragmented stack consolidated. Six tools a scientist normally juggles converge into one agent environment: a generalist agent coordinating specialist agents, whose output is not just an answer but the figures, code, environment, and full message history behind it.

Claude Science collapses those seams into one surface. It queries more than sixty scientific databases directly (Anthropic names UniProt, PDB, Ensembl, ClinVar, ChEMBL, and GEO among them), holds persistent Python and R kernels so variables, dataframes, and loaded models survive across a session, and renders what it produces in place: 3D protein structures, genome browser tracks, chemical structures, sequence alignments, and PDFs, without exporting to a separate viewer. The unit of work stops being a file passed between programs and becomes a conversation that carries its own state.

Results that check themselves

The more consequential move is not consolidation but review. Claude Science runs a background reviewer agent that reads the work as it is produced and flags three specific failure modes: incorrect citations, numbers that cannot be traced back to the data, and figures that do not match the code that supposedly generated them. Anthropic frames the result as work that checks and corrects itself.

RESULTS THAT CHECK THEMSELVES GENERALIST AGENT creates THE RESULT a figure a number a citation REVIEWER AGENT background critic produces reads flags and corrects WHAT THE REVIEWER CATCHES incorrect citations untraceable numbers code-to-figure mismatches
Figure 2. The actor-critic loop. A generalist agent creates the result; a separate reviewer agent reads it and flags bad citations, untraceable numbers, and code-to-figure mismatches, correcting before a human ever sees it. Splitting creation from criticism into two agents is the mechanism, not a slogan.

This is the actor-critic pattern, one agent to create and a separate one to judge, and it is not a demo trick. Jerome Lecoq, a neuroscientist at the Allen Institute, told Anthropic he built roughly twenty custom skills with sub-agents into a computational review template, paired every content agent with a reviewer for accuracy and citation fidelity, and now produces more than ten reviews, many over a hundred pages with checked citations, where a single such review used to take up to two years. The reproducibility follows from the same design: every artifact keeps its exact code, its environment, and the message history that produced it, and a figure can be edited in plain language, with the annotation flowing back into the code rather than being painted over the image.

Compute that follows the work

The third piece is reach. Real analyses do not fit on a laptop, and the tax on a scientist has always been the plumbing: writing the batch script, submitting the job, provisioning the GPU. Claude Science places the compute itself.

COMPUTE, PLACED TO FIT THE JOB laptop light work, local data HPC cluster batch jobs over SSH on-demand GPUs via Modal, 1 to hundreds the dataset is loaded once, held in memory across the whole analysis Claude Science picks the tier; the scientist does not wire each one by hand
Figure 3. Compute placed to fit the job. The agent runs light work locally, submits batch jobs to an HPC cluster over SSH, and scales on-demand GPUs via Modal from one to hundreds, loading the dataset once and holding it in memory across the whole analysis. The scientist states the work; the environment finds the tier.

Crucially, the data does not have to leave. Anthropic says Claude Science operates on lab infrastructure, keeping sensitive datasets local and sending only the necessary context to the model, which is the arrangement that makes an agent usable in a setting where the data is the asset and cannot be handed out.

Configured for the domain, and honest about the edge

Out of the box the workbench ships with more than sixty skills and connectors spanning genomics, single-cell analysis, proteomics, structural biology, and cheminformatics, plus integration with NVIDIA’s BioNeMo toolkit and models (Evo 2, Boltz-2, OpenFold3) and, through a partnership with LatchBio, agent-native access to verified bioinformatics pipelines. Skills are reusable: a pipeline written once is inherited across sessions, and sessions can be forked to compare approaches without losing the original. The early users Anthropic names are drug-discovery and biology groups, Manifold Bio, the Whitehead Institute, UCSF, Every Cure, and Xaira among them, with Stephen Francis of UCSF’s Brain Tumor Center reporting germline workups completed in roughly a tenth of the previous time and independently validated.

That naming is also the honest edge. This is a beta, its skills lead heavily with biology and biomedical research, and Anthropic is funding the next wave directly: an AI for Science program offering up to fifty projects as much as thirty thousand dollars in credits each, plus up to two thousand dollars of Modal compute, for work running from September to December 2026, with applications due 15 July. A workbench that arrives with a grant program attached is a workbench whose domain coverage is still being filled in, deliberately, by the people who apply.

The forward-deployed reading

For anyone who has read these notes, the architecture is familiar. Skills, connectors, a generalist agent coordinating specialists, and an actor-critic reviewer holding the output honest is the org-level harness we keep describing, engineered and pointed at a vertical. Claude Science is the clearest proof yet that the durable product is not the model but the harness around it: the same model becomes a coding tool, a coworker, or a research instrument depending entirely on the environment built to hold it. That is the work we do.

It also lands somewhere specific. A research economy deciding how to build its next decade of science, Oman’s national program among them, now has an off-the-shelf answer to what an agentic lab looks like, priced at free-download-to-start and backed by credits for the groups that move first. The gap between a lab that runs this way and one that does not is the same widening gap we have measured everywhere else, only now it runs through the bench. If your organization is standing up research or analytical capacity and wants it built to this standard from the start, start a conversation with us about a Discovery Phase.

References

  1. Anthropic. Claude Science: an AI workbench for scientists. 30 Jun 2026. anthropic.com/news/claude-science-ai-workbench
  2. Anthropic. Claude Science. 2026. claude.com/product/claude-science