Variable compliance units & comparisons

Continuing with the optimization efforts of the ARM prover, this time I will explain an idea of @xuyang.

A major bottleneck that we are facing is the time to prove compliance of the ARM rules. Currently, the size of the compliance units (CUs) is fixed to 2: one consumed resource and one created resource. The main sources of inefficiency are:

  • Many #CUs per action. Saying it differently, generation of many compliance proofs per action.
  • Alignment overhead. Whether resources are created or consumed matters. It’s not the same an action with 4/4 resources, than an action with 7/1 resources. They both have 8 resources in total, but the former needs 4 CUs, and the latter needs 7 CUs.

Variable-size compliance units

The CU holds all the resources of the action:

pub struct ComplianceVarWitness {
    /// The consumed resources.
    pub consumed_resources: Vec<Resource>,
    /// The created resources.
    pub created_resources: Vec<Resource>,
    /// The paths from the consumed commitments to the roots in the commitment trees
    pub merkle_paths: Vec<MerklePath>,
    /// The existing root for ephemeral resources
    pub ephemeral_root: Digest,
    /// Nullifier keys of the consumed resources
    pub nf_keys: Vec<NullifierKey>,
    /// Random scalar for delta commitment
    pub rcv: Vec<u8>,
}

pub struct ComplianceVarInstance {
    pub consumed_nullifiers: Vec<Digest>,
    pub created_commitments: Vec<Digest>,
    pub consumed_logic_refs: Vec<Digest>,
    pub consumed_commitments_tree_roots: Vec<Digest>,
    pub created_logic_refs: Vec<Digest>,
    pub delta_x: [u32; 8],
    pub delta_y: [u32; 8],
}

pub struct ComplianceVarUnit {
    // vk is a constant in the compliance unit, so we don't place it here.
    pub proof: Option<Vec<u8>>,
    pub instance: Vec<u8>,
}

There will be just one ComplianceVarUnit per action. Which means, only one proof needs to be generated. This makes a difference, as the proving cost is amortized among all resources of the action. With the added benefit of removing padding resources and its overhead: it costs (roughly) the same proving compliance of n resources, independently of how many of them are created or consumed.

Nonces

The nonce should be unique. This is guaranteed by making it dependent on the action’s context where resources are created. The suggestion is to derive the i-th created nonce from all the consumed nullifiers:

created_resources[i].nonce = hash(i,consumed_nullifiers)  

where hash is a collision-resistant hash function, and the index i is the position the created resource occupies in the CU (i.e. in the action).

Comparison

A proof of concept implementation can be found in this branch of the arm-risc0 repository. As a bonus, it also comes with an implementation of the sigmabus technique described here, that optimizes via removing EC arithmetic from the compliance circuit.

Key takeaways (TL;DR):

  • There is a significant drop on proving times when using variable-size CUs.
  • For large actions (with many resources), the sigmabus approach outperforms the vanilla variabe-size approach.

Disclaimer: The strange pikes in the timings are because benchmarks are not averaged. They correspond to a single run (it would be hard to average using Bonsai, anyways). They should serve only as intuition. You can try to reproduce the benchmarks running:

BONSAI_API_URL=https://api.bonsai.xyz/ BONSAI_API_KEY=XXX cargo test --release --features 'bonsai test' bench_compliance_2_var_sigmabus -- --no-capture

I have benchmarked using the Bonsai prover. I have not used a GPU.

2-size vs variable-size

For actions with just 2 resources, both approaches are essentially the same; which makes sense as in both case a single proof is generated. With 3 resources, the variable-size approach is ~2x faster, and with as little as 8 resources, it is ~5x faster. I stopped here due to the large proving times for the 2-size CUs, but the scaling factor should continue increasing with the number of resources per action. These timings confirm that the variable-size proving cost is amortized across all resources, and suggests that the more resources, the larger the amortization.

circuit resources (consumed,created) proving time #CUs alignment overhead
2-size 2 (1,1) 10.457182298s 1
var 2 (1, 1) 11.430059181s 1
2-size 3 (2, 1) 26.866941109s 2 6.716735277s
var 3 (2, 1) 15.357052552s 1
2-size 4 (2, 2) 25.885114226s 2
var 4 (2, 2) 15.587812528s 1
2-size 8 (4, 4) 43.110295019s 4
var 8 (4, 4) 15.82804433s 1
2-size 8 (7, 1) 73.768590046s 7 31.615110018s
var 8 (7, 1) 15.560026139s 1

variable-size vs sigmbabus

For actions with up to ~64 resources, their difference is not that much. In the extreme tested scenario (128 resources), variable-size takes ~40 seconds, whereas sigmabus takes ~17 seconds. In general, seems that the proving overhead increases faster for variable-size CUs – in the diagram below, the blue lines grows faster than the yellow line. Looks like sigmabus-128 stabilises at 20 secs.

Cap on resources. It’s worth noting that a sigmabus CU imposes an upper bound on the number of resources that can appear in a final (balanced) transaction. Benchmarks are for up to 64 and 128 resources per transaction. There doesn’t seem to be significant differences, which is good. It means we can settle for large upper bounds. (Yes, it is bad to have a cap on resources, but note there are other factors to consider, e.g. the current Ethereum controller seems to cap at ~100 resources per tx, due to gas limit constraints.)

3 Likes

Thank you for the write-up!

Do I understand correctly that this relies on there being at least as many consumed resources as there are created ones for the reasons of nonce generation?

So we would still need padding resources in case we are splitting:

E.g. I have one resource of 10 USDC and I want to send you 5.

I create a transaction: I consume one resource, but create two of USDC resources of quantity 5. One for you and one for me. Then I need to pad with 1 additional consumed resource.

There is nothing wrong with it, it is still extremely more optimized than what we use, I just want to make sure I understand correctly.

It means we can settle for large upper bounds.

I also agree that it may be ok settling on some commonsense upper bound. There will always be some clear bound after which the costs - for either proving or verifying - will outweight any benefit of a transaction. Even if not uniform, at the very least these bound should be evident for each given chain

There is no need of padding. In this case, the nonce of the first created resource depends on the consumed nullifier, and the index (i=0) of the created resource. The second nonce depends on index i=1, so both nonces are different.

What are the three nullifiers here?

yeah, was a typo, sorry (edited)

Ah, I see, I apologize. I misread to assume that we still used the ith consumed nullifier as a nonce, similar to the current implementation.

But here we apparently hash over all consumed nullifiers alongside an index.

That makes sense, thank you.

1 Like

Thank you for the post!

Consider avoiding the default index to represent information from the same resource—for example, consumed_resources[i], merkle_paths[i], and nf_keys[i] all describe one resource. A packed structure would be clearer.

pub struct ComplianceWitness {
    pub consumed_witnesses: Vec<ConsumedWitness>,
    pub created_resources: Vec<Resource>,
    pub ephemeral_root: Digest,
    pub rcv: Vec<u8>,
}

pub struct ConsumedWitness {
    resource: Resource,
    merkle_path: MerklePath,
    nf_key: NullifierKey,
}

The same to the instance:

pub struct ComplianceInstance {
    pub consumed_instances: Vec<ConsumedInstance>,
    pub created_instances: Vec<CreatedInstance>,
    pub delta_x: [u32; 8],
    pub delta_y: [u32; 8],
}

pub struct ConsumedInstance {
    pub nullifier: Digest,
    pub logic_ref: Digest,
    pub root: Digest,
}

pub struct CreatedInstance {
    pub commitment: Digest,
    pub logic_ref: Digest,
}

For nonce generation, hashing all consumed_nullifiers for each nonce is kind of costly, especially with a large nullifier set. Would it be better to use:

nulliffier_hash = hash(consumed_nullifiers)
created_resources[i].nonce = hash(i, nulliffier_hash)
1 Like