AVM Instruction Set - Instr₀

The following is a version of the Instruction set from my notes, typechecked with Agda v2.7.0.1.

Forgive the length of the content; it’s a compilation of the last few days’ work.
Thanks to @graphomath, @jan, and @Lukasz for early feedback and discussions.

Last updates:

  • Change Agda@X to X references.

Motivation and Context

As mentioned in this plan, this post only addresses the AVM instruction set syntax.

Disclaimer

This post is a work in progress. It is not yet complete and may contain errors.
Everything below is a proposal for defining the final instruction set of the Anoma
Virtual Machine and how to define AVM programs.

Below, you’ll see Agda code snippets and possibly references to defined terms.
If you want to read with code highlighted and clickable, I invite you to visit my notes.

What is the AVM?

AVM stands for Anoma Virtual Machine. It is a deterministic,
transaction-oriented virtual machine for message-driven objects. We
will unfold all these concepts here.

The concept of object is the deterministic sequential object explored in the
Objects module. Briefly, an
object is a function from input sequences to output sequences. In this view, an
object “reacts” to messages, input sequences, and produces output sequences,
possibly empty.

What is a message in the AVM?

Messages in the AVM are represented as S-expressions. The Val type defines
the structure of these S-expressions. It is a sum type of integers, booleans,
strings, pairs, and lists.

data Val : Set where
  VInt : ℕ → Val
  VBool : Bool → Val
  VString : String → Val
  VPair : Val → Val → Val
  VNil : Val
  VList : List Val → Val

This S-expression format allows objects to exchange structured data while
maintaining a simple, uniform representation. We define semantic aliases
to clarify the message-passing nature of object interaction:

Message : Set
Message = Val
  • Input is the type of inputs to an object.
  • Output is the type of outputs from an object.
Input : Set
Input = Message
Output : Set
Output = Message

What is an object in the AVM?

The object model in the AVM is minimal. An object is a record with a history of inputs and a
partial function from input sequences to output sequences.

record Object : Set where
  constructor mkObj
  field
    history : List Input
    object-type : List Input → Maybe (List Output)

Given an Object, one can query its output, transition the object to a new state by
processing new input, or determine whether it can process an input, among other
operations.

The AVM as a state machine

As a virtual machine, the AVM is a state machine that executes
programs—sequences of instructions or events—within an ambient execution frame.
Programs interact with this external environment through the execution of
instructions. The environment provides the necessary context for program
execution: the objects being operated upon, their inputs, and the outputs
they produce.

Central to the AVM is the notion of an instruction and event. We model the AVM’s
instruction set as an algebraic datatype whose constructors precisely define
the admissible state transitions.

With an instruction set type in place, the semantics of AVMProgram can be
defined in terms of interaction
trees
. From these, we derive an
operational semantics as a small-step transition system, enabling formal
verification of properties such as determinism.

State of the AVM: The Execution Environment

Programs run on objects, and objects live in a node/controller. Think of
a node/controller as a server or computer that hosts the objects, and the AVM
instance as another program running on the node/controller where we can interact
with the objects.

Node identifiers are aliased to NodeId.

NodeId : Set
NodeId = String

Objects are identified by ObjectId. The specifics of the ObjectId type
are not important for the AVM. It is just a unique identifier for an object. Say
for example that the ObjectId is a pair of a NodeId and a String.

ObjectId : Set
ObjectId = NodeId × String

We will also assume that:

  • all objects live in the same node (forever),
  • the ambient execution frame is the node itself, and
  • no other node is considered.

Future work could investigate how these assumptions could be relaxed.

Store type

As part of the execution, the AVM provides a view of the store of all objects in the
AVM. However, this view is partial because an object may not yet exist in the
store or may not be available. This is represented by the Store function type, where
a store is a partial function from ObjectId to Object.

Store : Set
Store = ObjectId -> Maybe Object

Transactional semantics

The AVM is a transactional machine. Programs can induce discrete state changes
as atomic blocks, ensuring that state changes are either fully applied or have
no effect at all. Therefore, some “memory” is needed in the current instance of
the AVM to store these tentative state changes—precisely the inputs sent to each
object. This is represented by the TxLog function type, where a TxLog is a
partial function from ObjectId to Input.

TxLog : Set
TxLog = ObjectId -> Maybe Input

As in TxLog, we use the Tx prefix to denote transaction-related terms,
following standard literature conventions. Note that the term “transaction” in
the Anoma ecosystem has a different meaning than here: it can refer to Resource Machine
transactions (the lowest level of atomicity that AVM programs ultimately compile
to).

Each AVM transaction is uniquely identified by a TxId.

postulate
  Hash : Set

TxId : Set
TxId = Hash

Ctx datatype

At any given point in the execution of the AVM, the current object is the one
that is being executed. The current input is the input that is being processed
by the current object.

record Ctx : Set where
  constructor ctx
  field
    self : ObjectId
    input : Input
    traceMode : Bool -- Whether to trace the execution of the AVM. handy?

State datatype

The AVM maintains an execution state that captures all information needed for
transactional computation. The state machine processes instructions by taking
the current state and an input sequence as input, producing a new state, output,
and execution trace.

The State record encapsulates the transactional context:

record State : Set where
  constructor st
  field
    store : Store      -- Persistent store of committed objects
    L : TxLog          -- Tentative write-set for the current transaction
    tx : Maybe TxId    -- Optional identifier of the open transaction
    y : Ctx            -- Execution context, it includes the current object and input

Each field serves a specific role in transaction processing:

  • store holds objects that have been successfully committed.
  • L accumulates writes within the current transaction (uncommitted changes).
  • tx tracks whether a transaction is currently active (some id) or not (nothing).
  • y provides additional execution context, it includes the current object and input.

Note that the input to the program is present in the execution context y—this
is the input being processed by the current object.

Then, a first approximation of the signature of the state transition function is:

  AVM : State -> Instruction R -> (State × R × Trace)

Here, Instruction is a type-level function from the instruction set of the AVM
to the result type of the instruction. Trace is a list of instruction-specific
AVMEvent that occurred during the execution of the instruction.

record AVMEvent : Set where
  constructor ev
  field
    event-name : String
    event-data : List Val
    event-result : Maybe Val
    event-error : Maybe Val
    -- you name it
Trace : Set
Trace = List AVMEvent

Building on this view, the signature of the AVM state transition function is:

AVM : State -> Program R -> (State × R × Trace)

Here, Program is a set of interaction trees over the instruction set.
Now it is time to define our first candidate for the instruction set of the AVM.

Baseline ISA

The AVM instruction set architecture (ISA) defines the primitive operations
available to programs. Instructions are modeled as an inductive datatype where
each constructor specifies its return type. ISA is the type-level function
signature of the instruction set.

ISA : Set
ISA = Set -> Set

The AVM requires additional machinery to handle introspection and debugging.
We need to track the safety of operations—that is, whether they are safe
to use in a safe context.

Safety level for operations

Two safety levels are defined: Safe and Unsafe.

data Safety : Set where
  Safe   : Safety 
  Unsafe : Safety 
  • Filter type: a predicate on ObjectId.
Filter : Set
Filter = ObjectId -> Bool
  • Dynamic type: a dynamic metadata representation.
postulate
  Dynamic : Set  -- Dynamic metadata representation

The baseline instruction set Instr₀ provides ten primitive operations:

data Instr₀ : Safety -> ISA where
  -- Transaction management
  beginTx   : Instr₀ Safe TxId               -- Start new transaction, returns its ID
  commitTx  : TxId → Instr₀ Safe Bool        -- Commit transaction, returns success
  abortTx   : TxId → Instr₀ Safe ⊤           -- Abort transaction, always succeeds

  -- Object lifecycle
  createObj : Input → Instr₀ Safe ObjectId   -- Create object from input
  destroyObj : ObjectId → Instr₀ Safe ⊤      -- Remove object from store

  -- Object interaction (pure message-passing)
  call      : ObjectId → Input → Instr₀ Safe Output   -- Send input, await output

  -- Safe introspection
  self      : Instr₀ Safe ObjectId                    -- Get current object ID
  input     : Instr₀ Safe Input                       -- Receive external input

  scry    : Filter → Instr₀ Unsafe (List ObjectId)  -- Query ALL objects
  reflect : ObjectId → Instr₀ Unsafe Dynamic        -- Inspect ANY object

Operations are marked as unsafe when they violate core principles of the object
model or pose systemic risks. Unsafe operations break encapsulation by accessing
object internals without going through the object’s interface, as is the case
with reflect. They bypass access control by allowing unrestricted inspection
or discovery of objects, potentially exposing sensitive information or
implementation details. They also risk unbounded computation by triggering
expensive global operations that scale with the total number of objects in the
system.

Consider scry: it applies a filter predicate across all objects in the
store. In a large system with millions of millions of objects, even a simple filter could
cause significant performance degradation. A malicious or poorly written filter
(e.g., one that always returns true) could force enumeration of the entire object store.

Similarly, reflect allows external inspection of any object’s metadata
without that object’s consent, breaking the principle that objects should
control what information they expose by processing a message—which
is not the case with reflect.

Safe programs cannot use unsafe operations. This design enforces at compile time
that safe programs remain isolated from potentially dangerous system-level operations.
These safe programs constitute what we formally define as “AVM programs”—the
standard execution model for user-level code in the AVM.

AVMProgram : Set → Set
AVMProgram = ITree (Instr₀ Safe)

And, knowing the risks of unsafe operations, we can define a SystemProgram
as any program that can use unsafe operations.

SystemProgram : Set → Set
SystemProgram A = ITree (λ B → Σ Safety (λ s → Instr₀ s B)) A

Instruction Set Semantics

The Instr₀ instruction set comprises primitive operations organized into four
categories: transaction control, object lifecycle, object interaction, and
introspection. Each instruction includes its safety level in the type signature.
With Instr₀, we have the syntax, but not the semantics,
what happens when we execute them.

We can define a small-step transition system for the instruction set in Agda
to provide formal semantics for each operation. This would establish precise
operational behavior through state transitions. We leave this formalization
for future work. Below, we describe the intended semantics of each instruction
in natural language for now.

Transaction Control

Transaction control instructions manage atomic execution contexts. All state
modifications within a transaction are tentative until committed.

beginTx

beginTx : Instr₀ Safe TxId

Initiates a new transactional context and returns a fresh transaction
identifier. All subsequent state modifications are logged to the transaction’s
write-set until the transaction is either committed or aborted. Transactions
provide atomicity: either all changes succeed or none do.

commitTx

commitTx : TxId → Instr₀ Safe Bool

Attempts to commit the transaction identified by the given TxId. It is
a synchronous operation that returns a boolean. Returns
true if the commit succeeds, persisting all logged changes to the store.
Returns false if the transaction cannot be committed (e.g., conflicts with
concurrent transactions or the transaction was already finalized).

abortTx

abortTx : TxId → Instr₀ Safe ⊤

Aborts the transaction identified by the given TxId, discarding all tentative
state changes in its write-set. The store reverts to its state before beginTx
was called. This operation always succeeds and returns unit.

Object Lifecycle

Object lifecycle instructions manage the creation and destruction of objects in
the store.

createObj

createObj : Input → Instr₀ Safe ObjectId

Creates a new object in the store from an Input. The Input
may be used to specify the object’s kind, initial program code, and
metadata. Returns a fresh ObjectId that uniquely identifies the new object.
Object creation is transactional: if the enclosing transaction aborts, the
object is not created.

destroyObj

destroyObj : ObjectId → Instr₀ Safe ⊤

Marks the object identified by the given ObjectId for destruction. The object
is removed from the store, and subsequent references to this ObjectId will
fail. Destruction is transactional: the object remains accessible within the
current transaction until committed.

Object Interaction

Object interaction is achieved through pure message-passing, preserving object
encapsulation. Message passing is the only way to interact with objects in the AVM.

call

call : ObjectId → Input → Instr₀ Safe Output

Performs synchronous message passing to the object identified by the given
ObjectId. Sends the input and blocks until the target object produces
an output. The output is returned to the caller.

Introspection and Context

Introspection instructions query the execution environment and object metadata.

self

self : Instr₀ Safe ObjectId

Returns the ObjectId of the currently executing object. This instruction is
essential for recursion and self-reference, allowing an object to pass its own
identifier to other objects or invoke itself. See also the use of self in
defining purely functional resources in the AVM.

input

input : Instr₀ Safe Input

This instruction blocks until input is available from the caller or external context.
This is the input being processed by the current object from Ctx.

scry (Unsafe)

scry : Filter → Instr₀ Unsafe (List ObjectId)

Queries the store for objects matching the given predicate Filter. Returns a
list of ObjectId for all objects satisfying the filter criteria. Marked as
unsafe because it can enumerate all objects in the system, potentially causing
performance degradation.

reflect (Unsafe)

reflect : ObjectId → Instr₀ Unsafe Dynamic

Retrieves metadata about the object identified by the given ObjectId. Returns
structured metadata. Marked as unsafe because it bypasses object encapsulation.

Bytecode Encoding

The typed Instr₀ representation provides type safety and enables formal reasoning, but practical AVM implementations require a serializable bytecode format for storage, transmission, and execution. The bytecode encoding and decoding mechanisms are defined in the Bytecode module, which provides:

  • Untyped bytecode representation with opcodes
  • Encoding functions from typed instructions to bytecode
  • Decoding with runtime type checking
  • Program serialization

See AVM Bytecode Encoding for the complete specification.

Examples

Example: Transfer

This example demonstrates an atomic transfer operation between two account objects. We assume the following context:

  • An Account object type is defined.
  • Account objects support withdraw and deposit methods that modify a balance field
  • Two Account objects are identified by fromAcc and toAcc

We define helper functions for creating messages:

withdraw : ℕ → Message
withdraw n = VList (VString "withdraw" ∷ VInt n ∷ [])
deposit : ℕ → Message
deposit n = VList (VString "deposit" ∷ VInt n ∷ [])
ok? : Val → Bool
ok? (VString "ok") = true
ok? (VString _) = false  -- Any other string
ok? (VInt _) = false
ok? (VBool _) = false
ok? (VPair _ _) = false
ok? VNil = false
ok? (VList _) = false

Given two object identifiers and an amount, we define a transfer function with type signature:

kudosTransfer : ObjectId → ObjectId → ℕ → AVMProgram Val

The implementation ensures atomicity through explicit transaction management:

kudosTransfer fromAcc toAcc amount =
  trigger beginTx >>= λ txId →
  trigger (call fromAcc (withdraw amount)) >>= λ r₁ →
  (if ok? r₁ then
    (trigger (call toAcc (deposit amount)) >>= λ r₂ →
     if ok? r₂ then
       (trigger (commitTx txId) >>= λ success →
        if success then
          ret (VString "transfer-complete")
        else
          ret (VString "commit-failed"))
     else
       (trigger (abortTx txId) >>= λ _ →
        ret (VString "deposit-failed")))
  else
    (trigger (abortTx txId) >>= λ _ →
     ret (VString "insufficient-funds")))

The transfer operation is atomic: either both the withdrawal and deposit succeed
and are committed, or the transaction is aborted and no state changes occur.

Example: Object Creation and Initialization

Creating a new counter object and performing initial operations:

createCounter : Val → AVMProgram ObjectId
createCounter initialValue =
  trigger (createObj (VList (VString "counter" ∷ initialValue ∷ []))) >>= λ counterId →
  trigger (call counterId (VString "increment")) >>= λ _ →
  ret counterId

-- Using the counter
counterExample : AVMProgram Val
counterExample =
  createCounter (VInt 0) >>= λ ctr →
  trigger (call ctr (VString "increment")) >>= λ _ →
  trigger (call ctr (VString "increment")) >>= λ _ →
  trigger (call ctr (VString "get")) >>= λ v₃ →
  ret v₃  -- Returns VInt 3

The counter maintains its state across calls, demonstrating the stateful nature
of objects created through createObj.

Example: Balance Update with Validation

Updating an account balance through message-passing with validation:

updateBalance : ObjectId → Val → AVMProgram Val
updateBalance account v = helper v
  where
    invalidMsg : Val
    invalidMsg = VString "invalid-value-type"

    helper : Val → AVMProgram Val
    helper (VInt n) =
      -- Validate non-negative
      (if 0 ≤? n then
        -- Get current balance via message
        (trigger (call account (VString "get-balance")) >>= λ oldValue →
         -- Update via message
         trigger (call account (VList (VString "set-balance" ∷ VInt n ∷ []))) >>= λ result →
         if ok? result then
           ret (VString "balance-updated")
         else
           ret (VString "update-failed"))
      else
        ret (VString "invalid-balance"))
    helper (VBool _) = ret invalidMsg
    helper (VString _) = ret invalidMsg
    helper (VPair _ _) = ret invalidMsg
    helper VNil = ret invalidMsg
    helper (VList _) = ret invalidMsg

Balance updates are validated before being applied through the object’s message interface.

Batch transfer

Calling a multi-method that operates on multiple objects simultaneously:

transferBatch : List (ObjectId × ObjectId × ℕ) → AVMProgram Val
transferBatch transfers =
  processTransfers transfers
  where
    processTransfers : List (ObjectId × ObjectId × ℕ) → AVMProgram Val
    processTransfers [] = ret (VString "batch-complete")
    processTransfers ((from , to , amount) ∷ rest) =
      kudosTransfer from to amount >>= λ result →
      (if ok? result then
        processTransfers rest
      else
        ret (VString "batch-failed"))

Batch operations can be made atomic by wrapping them in an appropriate transaction
boundaries.

Example: Object Destruction with Cleanup

Destroying an object after transferring its remaining balance:

closeAccount : ObjectId → ObjectId → AVMProgram Val
closeAccount accountToClose beneficiary =
  -- Get final balance
  trigger (call accountToClose (VString "get-balance")) >>= λ balance →
  closeWithBalance balance
  where
    closeWithBalance : Val → AVMProgram Val
    closeWithBalance (VInt n) =
      if 0 <? n then
        -- Transfer remaining funds
        (trigger (call accountToClose (withdraw n)) >>= λ _ →
         trigger (call beneficiary (deposit n)) >>= λ _ →
         -- Destroy the account
         trigger (destroyObj accountToClose) >>= λ _ →
         ret (VString "account-closed"))
      else
        -- No balance, just destroy
        (trigger (destroyObj accountToClose) >>= λ _ →
         ret (VString "account-closed-empty"))
    closeWithBalance (VString _) = ret (VString "invalid-balance")
    closeWithBalance (VBool _) = ret (VString "invalid-balance")
    closeWithBalance (VPair _ _) = ret (VString "invalid-balance")
    closeWithBalance VNil = ret (VString "invalid-balance")
    closeWithBalance (VList _) = ret (VString "invalid-balance")

This ensures no funds are lost when an account is closed - the remaining balance is
transferred before destruction using destroyObj

What’s next?

We just defined the instruction set syntax. This is the first iteration.
However, we need to define an interpreter for the instruction set.
Otherwise, how exactly does the execution of a program actually work?

The interpreter will be used to execute the instructions and produce
the output.

interpreter : State -> Instr₀ Safe -> (State × Output × Trace)

We already discussed how these interpreters should be defined, see
Interpreting ITrees.

Also, from here, it should be relatively easy to prove:

  • atomicity (all-or-nothing per tx)
  • determinism
3 Likes

Thank you for the detailed write-up! I will leave some miscellaneous questions for now, and attempt a more comprehensive response / directional write-up later:

Do we really need typed data at the VM level? Why not just S-expressions of natural numbers? I personally favor an untyped VM where we add types “on top” (in languages which compile to it, behavioral analysis, or other methods), because it’s simpler and doesn’t need to make any decisions about data representation. Is there a particular advantage to having types at this level?

We should be careful to disambiguate what is meant by “live” here:

  • Objects – at any point in their history – have exactly one machine which determines what their next state will be (determines which addition to history is considered canonical), which we call the object’s controller.
  • Object data may be replicated across many machines (nodes), and it may be possible to read and write the object from many machines, but any writes must be confirmed by the controller in order to be canonicalized across the system.

Filter should be a predicate over objects (and potentially associated metadata), not just their identifiers, the ability to scry by identifier alone is not particularly useful.

No, we don’t. We can actually abstract over Val the whole development here.

I fixed the representation because it is more readable and easier to write calls and messages in the example section. For example,

Noted it.

I see. Read/“Download” the latest version of an object translates into a (internal) call to the controller to provide its view/the canonicalised version of the object. If so, objects then behave quite similarly to replicated data structures à la Tango. The controller is the object’s log owner, the source of truth.

We can refactor this. But why? Why not work with object IDs? I assumed that the interface of any object is known, so just send the messages you want. Why do you need to actually fetch the object? The object identity (ID) works as a value.

However, as Lukask pointed out, if the RM apps let you fetch objects/resources, a developer would simply prefer to use RM low-level to fetch the object and avoid using Instr₀. That’s probably a point in favour of either fetchObject or self, as proposed below.

However, I’m not sure about letting fetch the metadata. BTW, what values does the metadata of an object actually consist of?

Anyway, we could enhance self instruction:

self : Instr₀ Safe (ObjectId x Object)

At least for now, I don’t include the object ID as part of the object def, it feels wrong.

Having identifiers is useful. It provides names for program terms. You code with those names, not necessarily with the “actual” object.

2 Likes

Makes sense. I think if we abstract over it (Val) that’s fine for examples.

We should have a way to control this. Programs may want to read the latest version of an object (by contacting the controller), but they may also want to just read the cached version. I proposed using some sort of “history identifier” in the desiderata topic, which is a possible approach to this problem (all reads would be relative to the history set in the current VM context).

It’s just that filters can’t typically be written in terms of only object IDs – object IDs themselves don’t contain any information about the actual objects, so the program would always need to fetch all the objects and then iterate over them (sending some message, say), which doesn’t seem helpful.

In the abstract model perhaps we could start just with a filter specified in terms of a program which takes an object ID as input and returns true or false. In practice we won’t be able to make arbitrary-program filters efficient but I think it’d be fine for now in the model – we should at least get the model correct first, then we could simplify to a DSL or such.

Hmm, this feels wrong to me in terms of abstraction boundaries – we want to be able to compile to RM transactions, transaction functions, etc., but a big reason for designing and formalizing a VM in the first place is for the developer to only need to reason about the VM semantics (and not care what we compile it to or how).

Hmm I could be wrong here, I just don’t see yet how filtering by identifier would work (per above). Do you have an example?

1 Like