Desiderata for the AVM

Preface

Designing a VM is no small task. A wide variety of different examples and design choices can be found in the literature, but none of them are likely to suit our context exactly – we need to pick and choose the pieces that make sense and assemble them together into a coherent whole. In order to know how to select the correct pieces and connect them together, we must have a clear idea of our boundary constraints: design choices, desired features and properties, and decisions we do and do not want to make. This post is an attempt to assemble a clear list thereof.

Basic context

  • The basic units of state and history in the AVM are objects. Objects can send and receive messages, and each object has a history.
  • Objects do not have mutable state. An object is either a code object or a data object. Data objects can be created exactly once, read any number of times, and destroyed exactly once. Code objects run a program when they are called.
  • At any point in time, execution is taking place within one object. Execution can move between objects with calls.

Distributed execution

The AVM is executed by many machines. At any point in time, execution is taking place on one machine. The AVM must support teleportation, where programs can move the current execution context to another machine.

Objects are controlled, each by a unique machine at any point in the history of the object. Each machine tracks the histories of other machines and synchronizes state, and each AVM execution is tied to a particular historical snapshot. Programs interacting with an object which is locally controlled (controlled by the machine on which the program is executing) are guaranteed to see the latest version of the object, which programs interacting with an object which is not locally controlled may see an out-of-date version.

AVM programs must be able to:

  • read the current (latest) history identifier from the executing machine
  • switch to a target history identifier

Do code objects need to be moved between controllers? Or are data objects (which exist on exactly one controller, since they’re immutable) sufficient?

Pure computation

The AVM must not make decisions about how pure computations (functions) are represented. Instead, the AVM should support an arbitrary set of known functions which can be appended to ver time.

Transactionality

The AVM must support transactional semantics, where programs can:

  • begin a transaction (scoped to a specific controller), starting a transaction context which collects pending state changes
  • commit a transaction context (attempting to atomically commit all pending state changes)
  • revert a transaction context (resulting in no state changes)

Preference-directed non-determinism (aka intents)

The AVM must support preference-directed non-determinism, where programs can:

  • request an arbitrary input, given a specified preference distribution over the range
  • constrain it in arbitrary ways (such that the transaction will fail if the constraints are not met)

This should be a sufficient substrate on top of which to implement Prolog-like logic programming.

Multi-party transactions

The AVM must support multi-party transaction constraints, where programs can:

  • create linear constraints which must be satisfied by message sends in the same transaction in order for the transaction to be committed, where a given message send can only be used to satisfy one linear constraint

The combination of preference-directed non-determinism and multi-party transactions must be sufficient to allow programs to implement arbitrary intents.

3 Likes

Thanks for writing this because it really makes me evaluate where AVM Instruction Set - Instr₀ sits in the actual AVM discussion.

Done, see What’s an object.

The word “mutable” here brings some tension.

In my understanding, objects can “mutate” via messages, resources, no. Objects compile to resources; however, the correspondence is not one-to-one. Every new message the object receives creates a new resource, altering the object’s history.

There are two abstraction levels to consider here:

  • The resource machine level: Each object is an immutable resource. When we “send a message” to an object, it doesn’t mutate. Instead, we create a new object with the updated history.

  • The AVM/logical level: An object is conceptually a function from input sequences to outputs, plus a history. This function is the immutable part—it specifies how the object responds to incoming messages, with no dependency on the history. However, objects mutate/evolve.

At the AVM level, the history is simply the sequence of messages the object has processed so far (in order matters). I prefer to call this sequence the object’s “view”, same ideas as in Tango. These views are subject to synchronisation in a distributed environment. That’s one reason we keep it along with the object’s definition.

The views of an object determine its “state,” which results from applying its function to its history. Note that different histories may produce the same view.

“The behavioural state of an object ⟨ī, φ⟩ is the equivalence class [ī]_φ of all input sequences that produce the same future behaviour.”

Why bring this distinction? Code is just data, in my opinion. An object may behave as a constant function—like a program that doesn’t care about inputs, just returning some value—in which case we can call it “data”. Or it may behave differently, actually doing something, in which case we call it “code”.

I don’t follow the “created exactly once” constraint. I work assuming that the object lifecycle is dictated by the following instructions:

createObj : Input → Instr₀ Safe ObjectId   -- Create object from input
destroyObj : ObjectId → Instr₀ Safe ⊤      -- Remove object from store

The semantics for createObj states that a new object is created with the given input (which defines its behavior). There is no comparison point to say “there is an object with this input that was already created”. How do we compare such a thing?

Regarding destroyObj, we could modify it to enforce that “an object is destroyed exactly once”. However, again, I need to introduce knowledge about what’s being destroyed, an additional index (as part of the State).

If os, instead of using in the return type of destroyObj, we can return a result of two possible natures: either an error (the object was already destroyed/doesn’t exist/isn’t available) or a successful deletion. That is, we use type Either AVMError ⊤, where

data AVMError  : Set where
    ObjectAlreadyDestroyed : AVMError
    ...

Any attempt to destroy an already-destroyed object returns an ObjectAlreadyDestroyed term.

This is compatible with Ctx, which puts in the AVM instance’s state the object id of the current object in execution. You can also find the input used.

record Ctx : Set where
  constructor ctx
  field
    self : ObjectId
    input : Input
    traceMode : Bool

in State:

record State : Set where
  constructor st
  field
    store : Store      -- Persistent store of committed objects
    ...
    y : Ctx            -- Execution context, it includes the current object and input

What do you really mean here? What do you move with the execution?

This is a fundamental requirement for the design of the AVM, and I believe adding this “later” would create difficulties, such as object history synchronisation, object availability, and different semantics for the call instruction.

So, be aware, this is something completely abstracted way, or better put, ignored, in Goose and also in State of the AVM: The Execution Environment. Both works are developed under the assumption there is only one node, which you call controller.

That’s for the basic context section of yours.

2 Likes

At a higher level of abstraction, I agree with you – data objects are an attempt to include the immutable state level (resources) in the VM directly. Perhaps we don’t need them, I’m not sure yet, but there could be advantages to enforcing this kind of strict separation: code objects could always be “compiled away” or flattened (de-nested) in a straightforward manner since they don’t themselves have any state (only message calls). Putting data objects (resources) in the VM allows programmers to have control over the low-level state representation when they want, while still allowing it to be abstracted away most of the time (as Goose does).

I just mean the flow of program execution (if you think about it operationally).

Okay, I can work with this. But, do you have something in mind on how to control the low-level state representation?

Refinement of desiderata for constraint programming, non-determinism, and multi-party transactions:

Simple constraint programming and non-determinism

Programs should be able to:

  • Request an arbitrary input from a specified domain (which may be very large). This corresponds to choose from the current draft.
  • Check constraints on arbitrary values (such that the current transaction will fail if the constraints are not met). This corresponds to require from the current draft.

The AVM runtime should be able to symbolically execute a program in order to extract free variables (from choose) and constraints (from require), and extract the resulting constraint satisfaction program to a suitable intermediate representation such that existing SAT solvers or similar could be used.

Intents and multi-party transactions

Multi-party transactions at the most basic level entail composing multiple program executions (or transactional parts of those executions scoped by beginTx / commitTx) into a single transaction.

Program/transaction composition requirements:

  • It should be possible to compose the transactional parts of multiple non-conflicting program executions into a single transaction, where that transaction executes the combined state changes from all of the individual transactions atomically.
  • It should be possible to compose the transactional parts of multiple sequentially dependent program executions (e.g. A and B where reads performed by B should reflect writes performed by A) into a single transaction, where that transaction executes the combined state changes from all of the individual transactions atomically, as if they were executed in order (A, then B, in this example).

These operations correspond to the transaction composition in the resource machine, just at the AVM level of abstraction.

In order to allow multi-party intents we need to additionally allow programs to check constraints on the state changes performed by other programs which we compose them with. There are many different ways to represent this – I know @graphomath is working on a clearer formalism in the AVM green paper, which will hopefully provide some better guidance here, but in the meantime I suggest something like:

  • The AVM runtime should keep a record of messages sent during the course of program execution (effectively, a record of the message sequence chart of the program execution).
  • AVM programs should be able to check constraints on other parts of the message sequence chart generated by other programs whose execution became part of the same transaction (as described above).
  • These constraints should be able to be both non-linear (leaving the MSC record untouched) and linear (once part of the stored MSC record has been used to satisfy a linear constraint, that part of the MSC record is removed, so that it can satisfy no further constraints). The latter is necessary to implement e.g. simple token swaps – cases where the intent author wants to guarantee that they’re offering some state change A in exchange for some other state change B, and that this exchange is separable from anything else that might be happening in the transaction.

We should also be able to perform some kind of speculative composition and symbolic execution of multiple programs in order to extract the resulting CSP to an intermediate representation suitable for use with SAT solvers, but there are a lot of details here which I haven’t thought about yet.