Local Domain Storage Upgrade

We are looking to take advantage of the move from ETS to Mnesia as much as possible.

There are two things this can buy us, that should inform the design of the API

Transactions

Before, all reads and writes were ‘dirty’; in raw ETS, there are no user-visible locks. I.E., mnesia:dirty_read and ets:lookup are functionally identical.

A transaction is a process that simply runs some code (and handles various locks for the user). All ops either succeed or fail, thus guaranteeing atomicity. Within the transaction, no other transactions will interfere, which also guarantees isolation.

The user may not always want these features, so I would suggest we expose both dirty and safe versions of reads/writes within our API and rely on the application writer to do the work of reasoning about when to use transactions.

We also need a better DSL for select statements. Currently retrieving rows in a K-V Table where the values are maps and we want only the results for where map key :a = "test" might look something like

iex(b@archlinux)52> :mnesia.transaction(fn -> :mnesia.select(Table, 
  [{{Table, :"$1", :"$2"}, 
  [{:is_map, :"$2"}, 
  {:==, {:map_get, :a, :"$2"}, "test"}], [:"$$"]}]) 
end)
{:atomic, [[1, %{a: "test"}]]}

Whereas I would probably prefer similar to

filter(_, %{a: "test"}, [])

Where argument one specifies a pattern for the key, argument two specifies a pattern for the value, and argument three specifies any kind of conditional clauses E.G., for something a bit more complex, like saying the value of a is a number greater than a certain value we might want

filter(_, %{a: :"$N"}, [[:>, :"$N", 3]])

I think this is relatively clean.

Persistence & Clustering

Whereas ETS is always in-memory only, Mnesia provides us with disc_copies, allowing us to store both the table schema and tables themselves on disc. Tables and schemas can also exist across erlang nodes. I.E., Mnesia makes the storage more durable.

This raises some infrastructure provisioning questions, which can be hard to hold in one’s head. For some background, we allow multihoming on the local domain I.E., there is a concept of a ‘node’ on the local domain aside from the concept of a node in Erlang (which is also an Mnesia node). A local domain node is just a mirror of the processes on the same VM.

Since we want to allow the user great flexibility over their system, there are different scenarios to enumerate:

  1. There are two erlang nodes, each running its own local domain node, using different storage tables (I.E., they are completely distinct local domains that happen to be running on the same machine)
  2. On the same erlang node, there are two local domain nodes, each using the same storage table (I.E., they are two local domain nodes part of the same local domain, on the same VM, on the same machine)

And the more complicated ones

  1. On the same erlang node, there are two local domain nodes, each using different storage tables (I.E., they are two local domain nodes which are part of distinct conceptual local domains, existing on the same VM)
  2. There are two erlang nodes, each running its own local domain node, using the same storage tables (I.E., they are two distributed local domain nodes, which are part of the same conceptual local domain)

I’m not sure we actually need to care about E.G., case 3 or case 4, but either way, how can a local domain node know about these things on startup? I think this calls for better LD configuration. There should be a configuration option for whether a node should use its own storage tables, or whether it should copy across a table from another Mnesia node. However, two local domains could be running separate table schemas, for whatever reason, and thus there has to be in this case, a higher-level configuration of the local domain in general, that said local domain shares a schema with another local domain for this to be possible. These are both functionalities that Mnesia exposes.

There could also be a case where the local domain does not even want to use disc copies, and this is another motivation for such configuration.

EDIT: I didn’t really talk about these things here, but since nodes can be added and removed from an Mnesia system, this suggests the possibility for local domain node migration. Also, Mnesia exposes its event info, meaning it’s possible to perform subscriptions against Mnesia events, though I’m not sure we have an immediate use-case.

1 Like