9.3 AI-agent workflow

Explanation

An AI agent can write code, tests, explanations, benchmark scripts, and logs. That does not remove the need for human review. The core workflow is a loop: brainstorm, plan, generate code, review, and give feedback.

AI-agent scientific coding loop: brainstorm, plan, code generation, and review. Forward arrows are solid; feedback arrows return to earlier steps.

Do not start from code generation. First clarify the scientific problem, assumptions, inputs, outputs, edge cases, data representation, tests, saved artifacts, and acceptance criteria. After the agent changes code, review tests first, then review the diff.

Useful prompts are specific. They ask the agent to inspect, measure, or propose before editing code.

Review this Rust/Cargo project. Check function inputs and outputs, ownership
and borrowing at boundaries, tests, saved metadata, and result files. List
missing validation checks before changing code.

Check whether 1D numerical functions use &[f64] or &mut [f64], and whether
2D or higher numerical data use tenferro typed tensors such as
tenferro_tensor::TypedTensor instead of nested vectors.

Run cargo test, then inspect whether the compute step writes results and
metadata to a file before plotting. Check whether small outputs use JSON or
text and large multidimensional arrays use .npy, .npz, HDF5, or a similar
array format.

Benchmark this code for the following case: ...
Report the command, input size, number of threads, output validation, and
bottleneck hypothesis before modifying the code.

Things to look up

Agentic coding
Rust code review
Cargo test
Benchmark
Performance optimization
Work log
Test report
Validation report
Result metadata

Exercise

Write a checklist of ten items to inspect after an AI agent changes code for a scientific calculation. Then write three prompts that are more specific than “Is this correct?” Include one prompt about review, one about performance, and one about reproducibility.

Notes for the exercise

Include diff review.
Include cargo test.
Include validation against known results or sanity checks.
Include data representation: slices for 1D, tenferro typed tensors for 2D+ unless the exercise says otherwise.
Include saved result files and metadata.
Include the separation of compute scripts and plotting scripts.
Include performance only after correctness is specified.
Ask the agent to report findings before modifying code when you need review or diagnosis.
Do not accept “it runs” as sufficient evidence.