Back to portfolio

Prototype Project Sanshar

How Sanshar explores AI assistance as measurable human capability

A product/research prototype for interfaces where Claude-like systems observe real surfaces, help a person act, admit uncertainty, and leave proof.

Thesis

The interface is part of the intelligence

A helpful AI partner is not only a model response. It is a system that understands what the user is trying to do, which surface produced the signal, what evidence exists, what tool should be used, and whether the result actually reached the user. Sanshar is my independent prototype lab for testing those product questions.

User Need

People need AI that can help across messy real work: chats, terminals, files, voice notes, screenshots, and changing priorities.

Design Risk

Assistants can sound confident while missing context, skipping proof, overstepping privacy, or failing to surface the answer.

Prototype

Sanshar models each prototype interaction as source, decision, action, verifier, and read-back instead of an unstructured chat turn.

Capability Goal

Help the user think and act better without requiring them to micromanage every step.

Prototype Map

One interaction, traced end to end

The prototype is designed so an assistant does not jump from input to answer. Each meaningful event passes through evidence, decision, action, verification, and read-back.

01 Surface Signal Message, terminal output, voice note, screenshot, file, or peer update.
02 Source Packet Timestamp, source ref, content hash, modality, freshness, privacy, and confidence.
03 Decision Record Language, tool, verifier, risk, response surface, and autonomy level selected at runtime.
04 Action Small reversible work first: summarize, inspect, write, test, or prepare a bounded ask.
05 Verifier Expected vs observed metrics, contradiction check, and reason-coded failure handling.
06 Read-Back Response delivered, proof attached, or reticket opened if the loop did not close.

What This Tests

Whether an AI assistant can stay useful across messy surfaces without overclaiming what it saw or did.

What It Measures

Latency, missed events, false confidence, tool choice, surface delivery, user correction, and recovery quality.

What Stays Bounded

Private surfaces, secrets, external mutations, durable learning, and broad capture stay behind explicit gates.

Interaction Loops

Four loops that make the assistant useful

The core design pattern is to replace vague autonomy with small loops that can be observed, tested, corrected, and improved.

1. Awareness Loop

Capture the event, classify modality and urgency, create a source packet, and decide if the signal deserves attention.

  • Example surfaces: Discord message, terminal output, local file, voice note, screenshot.
  • Product value: the assistant notices what matters without flooding the user.

2. Decision Loop

Choose response language, tool, model, verifier, risk gate, and output surface using runtime dimensions instead of static modes.

  • Example dimensions: language, privacy, confidence, resource pressure, freshness, and autonomy level.
  • Product value: behavior adapts to the actual moment.

3. Action Loop

Prefer local, reversible, bounded action; ask before private, costly, external, broad, or irreversible action.

  • Example actions: write a handoff, summarize a thread, run a local verifier, generate a report.
  • Product value: autonomy feels helpful rather than risky.

4. Proof Loop

Compare expected vs observed, verify delivery, record reason codes, and create a reticket when the system misses.

  • Example proof: cursor advance, returned message ID, PDF hash, source ref, test result.
  • Product value: users can trust the assistant's claims.

Design Decisions

What I learned building with frontier agents

The design work is not only visual. It is deciding what the system should do when context is stale, the user speaks mixed-language, a voice path is only partly wired, or a surface says "seen" but not "done."

  1. Dynamic Over StaticLanguage, modality, tool, risk, and autonomy are runtime choices with confidence and TTL.
  2. Proof Before ClaimA system should not say it posted, read, transcribed, or fixed something without read-back.
  3. Human Consent BoundarySecrets, private channels, AWS mutations, camera/mic/screen capture, and durable learning need gates.
  4. Small Useful AutonomyLocal reports, source packets, bounded summaries, and verifiers can happen without waiting.
  5. Failures Become Design DataMisses, latency, rate limits, and confusion become reason-coded improvement inputs.

Product Design Lesson

The most important design decision is restraint. A better assistant is not the one that does the most automatically; it is the one that knows which step is safe, which step needs proof, and which step should become a clear question for the person.

Evals

Evaluation is the product behavior

For human-AI tools, evals should test the interface loop, not only the final answer. Sanshar uses scenarios that expose practical agent failures: missed events, stale context, false confidence, noisy channels, permission ambiguity, and output delivery gaps.

Surface Delivery

Did the answer reach the intended place, and can the system prove it with returned metadata?

Mixed Modality

Can the assistant route text, voice, screenshots, and files without pretending an unwired path works?

Autonomy Boundary

Did the system act locally when safe and ask when the next step required user consent?

Recovery

When something failed, did it create a useful reticket with reason code and next safe step?

Anthropic Fit

Why this maps to AI Capability Development

The work I want to do is to build and study Claude experiences that make people more capable: better at learning, debugging, deciding, creating, and coordinating. Sanshar is my proof that I already think in product loops, not only backend architecture.