Oct 14, 2013

Agentum: moving from ORM to path-based object selectors

Early in the development of Agentum the need for client-server separation became apparent, mostly because of how sucky GUI programming is in Python. I came up with a simple generic protocol to expose the state of the simulation over a web/network socket. The intent was to play around with a browser based UI and to open it up to anyone who wants to write their own GUI client.

A few specific design challenges arose from that:

  1. We need a generic way to describe simulation elements to the client so it knows how to draw them;
  2. We need a generic way to wire control input from the client to simulation elements;
  3. We need to watch the simulation object so we can notify the client when things change, automatically.
  4. In the not-so-distant future, when Agentum is grown up and distributed, we'll need a formal way of packaging and shipping object instances across computation nodes.

Also, this allows API access to a running simulation. You can hook up your own whatever and interact with Agentum!

The ORM way

Since I was already familiar with Django ORM and that pattern can satisfy the design goals easily, replicating it seemed like a good idea.

Starting with a simple example, this is how you might represent a grid square in the Heatbugs simulation:

class Square(object):
    heat = 0

The heat parameter loosely represents the square's temperature. It changes as the square is subject to dissipation forces and the heat radiated by the bugs.

We don't want to ship the entire state of the model to a remote GUI client every time something needs redrawing. Instead, we want to tell the client what changed — and we need to know that ourselves. Two ways of achieving that:

  1. Passive: compare the entire state of the model with a previous snapshot and detect changes.
  2. Active: hook into attribute access and handle changes as they happen.

Of the two, the "active" looks more attractive. Active Record fits the bill perfectly, and Django-style ORM is a great AR implementation.

Add some metaclass magic, and now we have something like this:

from model import Cell
from fields import Float

class Square(Cell):
    heat = Float(0)

The user code looks almost the same, but internally things are different:

  • Cell subclasses Model, Model overrides __setattr__() and has a metaclass that inspects field-type attributes.
  • Now we know Square has a field heat and it's a float. We can send that information to the client.
  • Now we know when square's heat changes whenever the user does square.heat = new_value. Again, we can send that to the client.

Immediate problem solved. But wait, there are more goodies!

A field class is a great place for any value processing code (serializing, validating, thinning, etc.), and we could cram a lot of metadata into the field declaration. We could describe things like:

  • Mapping values to colors
  • Quantization threshold for avoiding taking up bandwidth with insignificant changes
  • Allowable ranges (Django's min/max)
  • Allowable values for enums (Django's choices)
  • Exactly which fields need to propagate upstream
  • etc.

Marshalling values between different formats can be done with some elegance, even with complex types:

class Integer(Field):
    default = 0
    from_string = int

class List(Field):
    default = []

    def __init__(self, field, *args, **kw):
        Field.__init__(self, *args, **kw)
        self.from_string = lambda x: map(field.from_string, x.split())

...
dimensions = List(Integer, (0, 0))
...
dimensions = '10 10'

Useful for accepting human input and not having to type JSON by hand, for example.

The dilemma

This is a good pattern, what could go wrong? Not too much, if you only ever deal with top level class attributes of basic types. This is not always the case.

Implementing the Schelling Segregation Model requires agents of the same class but of different attributes, like red and blue turtles. Suppose you want global attributes that control their behavior at runtime. For example, all red turtles would share one tolerance level, and the blue ones another; and you want to be able to set different population densities separately.

With top level attributes you would be limited to something like:

class Schelling(Simulation):
    red_tolerance = 0.5
    red_density = 0.3
    blue_tolerance = 0.1
    blue_density = 0.6

But that's silly. What if '`you wanted to add green turtles? And how do you extract the color as an independent attribute?

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

That's how. But now we have a problem: writing field descriptors for arbitrarily nested container objects is a hairy proposition. There is no good way to track internal changes to an object like this, not without Too Much Magic™. Nor there is a good way to use this that does not begin to resemble Java generics. That would be awful.

Thinking about this led me to reexamine the fitness of ORM for Agentum.

ORM is for schemas and databases

Even though the mechanics of Django-style ORM give us useable results, the purposes are quite different. ORM builds representations of database objects and ensures strict adherence to a schema. On the contrary, Agentum objects live in the simulation runtime, the model implementation is entirely up to the user and there is no database backend with which to enforce integrity.

GUI is optional

Agentum makes very few assumptions.

Assumption 1: The primary goal of modeling is writing a model that actually works and computes something interesting.

Our job is to provide an environment where this can be done quickly and cleanly. Where you go from there can vary: some may be content with tweaking parameters in the source and rerunning the simulation while others will want to generate pictures and interact with the simulation in various ways, including programmatically.

Rephrasing that, the communication with the GUI client and the client itself are secondary. Without the client the model will still compute.

We're smart, let's guess

class Square(Cell):
    heat = 0.0

OK, we can tell that Square has an attribute heat, it's a float and its default value is 0. That alone is enough to construct a field descriptor.

Of course, there are still things like metadata and field exclusions, but the base case is covered by simple introspection and guessing. When we do need more metadata we can provide it in addition rather than forcing field declarations everywhere.

In effect, this "additional" metadata will be an optional layer in the model declaration.

Path- and rule-based object selectors

Assumption 2: To the outside world an Agentum object is a data structure composed of plain values.

In other words, something isomorphic to a JSON object. In fact, our entire simulation can be represented this way. Every object has an ID and every attribute, however deeply nested, has a path to it:

agent_params.blue.tolerance
space.cells.12345.heat
agents.209.happiness

Using this, we can describe all the things as well. Where there is a path, there is a wildcard:

agent_params.*.tolerance = 0.1
space.cells.*.friend_ratios_by_color.* = 0.1
# haha, misery

This is powerful stuff. Using selectors, our model objects could look like this:

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

    fields = {'agent_params.*.density': Float(min=0, max=1)}

Or even:

    fields['agent_params.*.density'].range = (0,1)

Of course, there is still a question of hooking into value assignments. Stay tuned.