Memory and Evaluation - Daniel Wahnich

Position

Memory is useful only when evaluation keeps it honest.

A system that remembers everything without checking anything becomes dangerous. It can carry stale context, private context, wrong assumptions, or old decisions into new work with too much confidence.

The serious version of memory is paired with evaluation. The system should remember enough to improve, but it should also ask whether that memory is relevant, permitted, accurate, and useful for the current mission.

Layers

Different memory layers serve different kinds of work.

Working memory helps the current step. Episodic memory carries events and conversations. Semantic memory carries knowledge and concepts. Procedural memory carries workflows and skill patterns. None of these should become an excuse to stop checking the result.

Evaluation is the discipline that decides what memory can influence. It asks whether the output matches the mission, whether the boundary held, whether the evidence is sufficient, and whether the next step should continue or stop.

Working memory: the live context window around the mission.
Episodic memory: prior events, conversations, and outcomes.
Semantic memory: stable concepts, definitions, and relationships.
Procedural memory: workflows, preferences, and operating patterns.
Evaluation: checks against mission, boundary, artifact quality, and evidence.
Correction: feedback that updates the next loop without pretending certainty.

Aweb

Aweb needs memory because one operator cannot manually re-load every context forever.

If Aweb is going to support serious work across systems, it needs to carry context from one loop to the next. GEX should remember research boundaries. Veritas should remember calibration posture. Nina should remember commerce workflow state. Leony should remember creative approvals and campaign constraints.

But that memory has to be controlled. The operator needs a way to inspect what the system remembered, what it ignored, what it updated, and what it should never use again.

Boundary

Public memory talk must not imply private memory exposure.

This page does not expose private Aweb memory, user memory, customer memory, API keys, emails, databases, or internal logs. It explains the operating model and the discipline required around it.

The public claim is simple: memory makes agentic work more continuous, and evaluation keeps continuity from becoming blind confidence.

Memory without evaluation is confidence drift. Evaluation without memory is repetition. The operating layer needs both.

Agent receipts Mission contracts Aweb system graph