Jan 13, 2014

Adventurous metaclassing: automatic class splitting

td;dr: I had a problem that could only be solved by splitting a class into two separate superclasses and inheriting from both. This article describes a metaclass that achieves the same effect. Contrive, it's fun.

Let's write a metaclass that splits a superclass of a class in two, while demonstrating a rare legitimate use case for mixins.

Why would you ever want to? My use case arose from an uncooperative dynamic class.

Atom (the problem)

I was building a UI using Enaml, which internally relies on Atom (formerly Traits). Among other things, Atom is an observer framework that provides bindings for Enaml, allowing attribute value changes to automatically update the UI.

To use, you subclass Atom and declare class attributes initialized to field instances, just like an ORM.

from atom.api import Atom, Int

class Model(Atom):
    counter = Int()

The problem is that Atom expects all attributes to be declared fields. If the attribute is not a known field type, Atom will turn it into a read-only field. This means you cannot have a "normal" attribute alongside an "atomic" field, and vice versa.

class MixedModel(Atom):
    counter = Int()
    x = 0  # x is read-only

# subclassing won't help
class Extended(Model):
    x = 0  # ditto

    def __init__(self):
        self.x = 1  # error: object attribute 'x' is read-only
        self.y = 1  # error: object has no attribute 'y'

Well, that's annoying. But what if an atom was a mixin, would that work?

class ModelAtom(Atom):
    counter = Int()

class ModelBase(object):
    x = 0

class Model(ModelBase, ModelAtom):

# fingers crossed...
m = Model()
m.x = 5  # works!

Solution: mixins

So, a pure atom mixin can contain our atomic fields and we can use them in our impure object. We still need a "normal" class to co-inherit from (where the other attributes will live), and it cannot be object.

This frees us from having to make the whole class atomic, but writing new mixins to contain atom fields can get tedious. Assuming that modifying Atom itself is out of the question, what can we do to automate the generation of the necessary classes?

The Metaclass

Naturally, anytime we need to mess with the class creation it's time for metaclasses. In this case, however, we are not just creating a class. We need to create up to two new classes and make our new class inherit from them instead. In other words, we need to rewrite the inheritance tree.

We do this by dynamically creating new classes and injecting them into the class bases. For consistency and reference, we will also inject them into the module as if they were defined there. One of the new classes will get all the atom fields, and subclass Atom. The other class will be optional, in case our target class is a direct subclass of object. The target class will keep all the other attributes.

def split_dict(d, test):
    a, b = {}, {}
    for k, v in d.iteritems():
        if test(k, v):
            a[k] = v
            b[k] = v
    return a, b

from atom.catom import Member
from atom.api import AtomMeta, Atom
import inspect
class AtomizerMeta(type):
    def __new__(meta, class_name, bases, attrs):
        module = inspect.getmodule(inspect.stack()[1][0])

This will the metaclass of our new mixin base, Atomizer. The metaclass will run on Atomizer itself but we only need the magic to happen in its subclasses, so skip it:

        if module.__name__ == meta.__module__ and class_name == 'Atomizer':
            return super(AtomizerMeta, meta).__new__(meta, class_name, bases, attrs)

        # pluck out atom fields
        atom_attrs, basic_attrs = split_dict(attrs, lambda k,v: isinstance(v, Member))

atom_attrs contains only the atomic fields (subclasses of Member) and basic_attrs all the rest.

One of the bases will be Atomizer itself. Because we are replacing it with two new classes, let's remove it from bases:

        new_bases = tuple(b for b in bases if b != Atomizer)
        new_classes = ()

Create the two new classes:

        # allow use as both mixin and a sole superclass
        if not new_bases:
            basic_name = class_name + '_deatomized'
            basic_class = type(basic_name, new_bases, {})
            new_classes += ((basic_name, basic_class), )
            new_bases += (basic_class, )

        if atom_attrs:
            atom_name = class_name + '_atom'
            atom_class = type(atom_name, (Atom,), atom_attrs)
            new_classes += ((atom_name, atom_class), )
            new_bases += (atom_class, )

Later new_bases will become the bases for our target class. Note that we are covering two different cases: inheriting directly from Atomizer or having it as a mixin. In the latter case we already have a "normal" class to contain our "normal" attributes.

Finally, let's populate the module space and finish creating the target class.

        # inject into the containg module
        for n, c in new_classes:
            setattr(module, n, c)
            setattr(c, '__module__', module.__name__)  # just in case

        return type(class_name, new_bases, basic_attrs)

class Atomizer(object):
    '''Smart atomizer'''
    __metaclass__ = AtomizerMeta

we haz a win?

from atom.api import Atom, Int

class A(Atomizer):
    a = Int()
    b = 0

Yep. No need to bother with separate mixin classes.

Oct 14, 2013

Agentum: moving from ORM to path-based object selectors

Early in the development of Agentum the need for client-server separation became apparent, mostly because of how sucky GUI programming is in Python. I came up with a simple generic protocol to expose the state of the simulation over a web/network socket. The intent was to play around with a browser based UI and to open it up to anyone who wants to write their own GUI client.

A few specific design challenges arose from that:

  1. We need a generic way to describe simulation elements to the client so it knows how to draw them;
  2. We need a generic way to wire control input from the client to simulation elements;
  3. We need to watch the simulation object so we can notify the client when things change, automatically.
  4. In the not-so-distant future, when Agentum is grown up and distributed, we'll need a formal way of packaging and shipping object instances across computation nodes.

Also, this allows API access to a running simulation. You can hook up your own whatever and interact with Agentum!

The ORM way

Since I was already familiar with Django ORM and that pattern can satisfy the design goals easily, replicating it seemed like a good idea.

Starting with a simple example, this is how you might represent a grid square in the Heatbugs simulation:

class Square(object):
    heat = 0

The heat parameter loosely represents the square's temperature. It changes as the square is subject to dissipation forces and the heat radiated by the bugs.

We don't want to ship the entire state of the model to a remote GUI client every time something needs redrawing. Instead, we want to tell the client what changed — and we need to know that ourselves. Two ways of achieving that:

  1. Passive: compare the entire state of the model with a previous snapshot and detect changes.
  2. Active: hook into attribute access and handle changes as they happen.

Of the two, the "active" looks more attractive. Active Record fits the bill perfectly, and Django-style ORM is a great AR implementation.

Add some metaclass magic, and now we have something like this:

from model import Cell
from fields import Float

class Square(Cell):
    heat = Float(0)

The user code looks almost the same, but internally things are different:

  • Cell subclasses Model, Model overrides __setattr__() and has a metaclass that inspects field-type attributes.
  • Now we know Square has a field heat and it's a float. We can send that information to the client.
  • Now we know when square's heat changes whenever the user does square.heat = new_value. Again, we can send that to the client.

Immediate problem solved. But wait, there are more goodies!

A field class is a great place for any value processing code (serializing, validating, thinning, etc.), and we could cram a lot of metadata into the field declaration. We could describe things like:

  • Mapping values to colors
  • Quantization threshold for avoiding taking up bandwidth with insignificant changes
  • Allowable ranges (Django's min/max)
  • Allowable values for enums (Django's choices)
  • Exactly which fields need to propagate upstream
  • etc.

Marshalling values between different formats can be done with some elegance, even with complex types:

class Integer(Field):
    default = 0
    from_string = int

class List(Field):
    default = []

    def __init__(self, field, *args, **kw):
        Field.__init__(self, *args, **kw)
        self.from_string = lambda x: map(field.from_string, x.split())

dimensions = List(Integer, (0, 0))
dimensions = '10 10'

Useful for accepting human input and not having to type JSON by hand, for example.

The dilemma

This is a good pattern, what could go wrong? Not too much, if you only ever deal with top level class attributes of basic types. This is not always the case.

Implementing the Schelling Segregation Model requires agents of the same class but of different attributes, like red and blue turtles. Suppose you want global attributes that control their behavior at runtime. For example, all red turtles would share one tolerance level, and the blue ones another; and you want to be able to set different population densities separately.

With top level attributes you would be limited to something like:

class Schelling(Simulation):
    red_tolerance = 0.5
    red_density = 0.3
    blue_tolerance = 0.1
    blue_density = 0.6

But that's silly. What if '`you wanted to add green turtles? And how do you extract the color as an independent attribute?

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

That's how. But now we have a problem: writing field descriptors for arbitrarily nested container objects is a hairy proposition. There is no good way to track internal changes to an object like this, not without Too Much Magic™. Nor there is a good way to use this that does not begin to resemble Java generics. That would be awful.

Thinking about this led me to reexamine the fitness of ORM for Agentum.

ORM is for schemas and databases

Even though the mechanics of Django-style ORM give us useable results, the purposes are quite different. ORM builds representations of database objects and ensures strict adherence to a schema. On the contrary, Agentum objects live in the simulation runtime, the model implementation is entirely up to the user and there is no database backend with which to enforce integrity.

GUI is optional

Agentum makes very few assumptions.

Assumption 1: The primary goal of modeling is writing a model that actually works and computes something interesting.

Our job is to provide an environment where this can be done quickly and cleanly. Where you go from there can vary: some may be content with tweaking parameters in the source and rerunning the simulation while others will want to generate pictures and interact with the simulation in various ways, including programmatically.

Rephrasing that, the communication with the GUI client and the client itself are secondary. Without the client the model will still compute.

We're smart, let's guess

class Square(Cell):
    heat = 0.0

OK, we can tell that Square has an attribute heat, it's a float and its default value is 0. That alone is enough to construct a field descriptor.

Of course, there are still things like metadata and field exclusions, but the base case is covered by simple introspection and guessing. When we do need more metadata we can provide it in addition rather than forcing field declarations everywhere.

In effect, this "additional" metadata will be an optional layer in the model declaration.

Path- and rule-based object selectors

Assumption 2: To the outside world an Agentum object is a data structure composed of plain values.

In other words, something isomorphic to a JSON object. In fact, our entire simulation can be represented this way. Every object has an ID and every attribute, however deeply nested, has a path to it:


Using this, we can describe all the things as well. Where there is a path, there is a wildcard:

agent_params.*.tolerance = 0.1
space.cells.*.friend_ratios_by_color.* = 0.1
# haha, misery

This is powerful stuff. Using selectors, our model objects could look like this:

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

    fields = {'agent_params.*.density': Float(min=0, max=1)}

Or even:

    fields['agent_params.*.density'].range = (0,1)

Of course, there is still a question of hooking into value assignments. Stay tuned.

Oct 05, 2013

JavaScript gotcha's for Pythonistas

Someone has to build the UI and today it's you.

In many ways JS is similar to Python. So similar than when it's not, it might surprise you. Below are some things I found intersting in two solid months of javascripting.

Object declarations and attribute access

In JS the closest thing to Python's dicts are Objects.

// this is the same in both JS and Python
mything = {'foo': 1, 'bar': {'x': 5}}

// this is not
mything = {foo: 1, bar: {x: 5}}

// but both are the same in JS!

Python will take your key literals and treat them as variable names. JavaScript will turn them into strings and use them as key names! What's more fun, it only does that for keys, not the values.

            foo = 1
python:     {foo: foo} => {1: 1}
javascript: {foo: foo} => {'foo': 1}

There is an upside to that — you don't have to use quotes so much:

// instead of this
x = {'foo': 1};
x['bar'] = 2;

// just do this
x = {foo: 1};
x.bar = 2;

And a downside — using variables as key names requires a little dance:

mykey = 'foo';

x = {};
x[mykey] = 1;

Attribute access

JS is very relaxed about that, there is the famed dotted notation. Python has no built-in equivalent. The upshot is that any nested attribute can be accessed by a single dotted selector:

// instead of
mything['foo']['bar'][0]['x'] = 5;

// simply do
mything.foo.bar.0.x = 5

Notice this works for array indexes too. Of course, variable key names will still require [].

Unfortunately, this has to be a literal. There is no built-in way to do mything['foo.bar.0.x'].

Missing attributes

(updated 10.13)

Unlike Python, JS always returns attribute values, even the missing ones. The missing attributes miraculously have values, and their value is undefined. This can be handy:

// naively in Python:
value = False
if 'maybe_there' in obj:
    value = obj['maybe_there']
if value == 2: ...

// skillfully in Python:
if obj.get('maybe_there', None) == 2: ...

// in JS:
if (obj.maybe_there == 2) { ... };

Of course, you can only take this one level deep:

> obj = {}
> obj.foo
> obj.foo.bar
TypeError: Cannot read property 'bar' of undefined

Object comparisons

A colleague pointed this out to me, objects do not compare:

> {'a': 1} == {'a': 1}
> {'a': 1} === {'a': 1}


for (var i in things) {

Despite how straightforward and intuitive that looks, it is not.

If things is an Object, i will iterate over key names. Ok, fine.

If things is an Array, i should iterate over elements, right? Wrong. i will iterate over array indexes.

If i iterates over array indexes those should be integers, right? Sometimes. Sometimes it's a string representing an integer!

Handle this however you like, just beware.


JavaScript can't handle the truth

Oh yeah. Your if statements are not doing what you expect and you begin to wonder why.

> Boolean('')

Good, we were expecting that. That means our habit carries over, right?

> Boolean({})
> Boolean([])



Does Underscore have anything handy? isEmpty() looks useful:

> _.isEmpty('')
> _.isEmpty([])
> _.isEmpty({})
> _.isEmpty(false)

Hey, this is good! Something consistent we can use. Or is it?

> _.isEmpty(true)

Sad panda.


And of course: https://www.destroyallsoftware.com/talks/wat

Oct 01, 2013

The setup, part 2

Life has changed a little since the last report.

  • A maxed out MacBook Air has replaced the Pro and I love it. The screen hinge is weak, the screen is reflective — those are annoying, yes. But overall, it weighs nothing, runs a VM effortlessly and I can work unplugged the whole day.
  • sshfs and 10.8 are no longer friends. Sublime cannot reliably detect file changes anymore, so instead the guest OS now mounts a shared folder.
  • Having to use mercurial sucks and may deserve a separate post.

May 30, 2013

The setup

Over the past couple of years I have had the fortune to pick up some great habits from great people. It is time to share an updated setup. These are the things my work-life currently depends on.

  • Python

  • Life environment: OSX, iMac, MacBook Pro, and recently back to iPhone.

  • Dev environment: VirtualBox. I still like free, and VBox works just fine.

  • Dev OS: Ubuntu Server. No more esoteric nonsense like Arch. The skinnier Server Edition is preferred as I need very little graphics capability from my Linux. Cannot stand Linux GUIs.

    Now, how it all works together:

    • Linux runs the code and manages packages. All the dev stuff stays there. Linux runs in headless mode.
    • OSX runs the editor (Sublime Text), terminal (iTerm), browsers, etc. and keeps my senses happy with beautiful fonts.
  • sshfs: the sanest way I know to share files between host and guest OSes. I keep a virtual directory mounted locally on OSX and point the editor there.

  • X11: always ready when I do need to bring up a GUI tool over a tunnel (I always ssh -X into my virtual machine), which are:

  • git gui, gitk: if they are not part of your git-fu already, you are missing out
  • tig: very useful for a quick history browse
  • IPython+Notebook: I often run my code as I'm testing it, and IPython is the way to interact with the interpreter. As the experiment grows or whenever working with data plots, firing up Notebook is worth the hassle.
  • nose+mock: I finally learned to stop worrying and love the tests. Yes, tests are always worth having.
  • Sublime Text: this probably deserves a separate post. In short, breaking away from the IDE land has been a happy change.
  • pianobar

May 21, 2013

Support your local caching

Suppose you had an expensive function xyz() whose output does not change often. Caching sounds appropriate. You may be inclined to memoize the function at the source, maybe by adding a decorator.

def xyz(a, b):

result = xyz('foo', 'bar')

You have just added behavior on which downstream users are likely to end up depending; and imposed a design decision on them. This is fine in most cases.

But perhaps not all users need xyz() cached. Perhaps the results are only good for some time but only the user knows exactly how long. Perhaps the caching behavior will depend on factors you do not even know ahead of time. Now you have a problem: making a simple decision that works for everyone is not trivial any more.

To avoid a contrived dependency configuration, consider caching at the usage point — the function call itself — and use it where you need it.

Stamped = namedtuple('Stamped', 'stamp obj')
cache = {}

def cached(ttl, func, *args, **kw):
    'Simple inline cache for function calls'

    key = (func.__name__,) + args
    entry = cache.get(key, None)
    now = time.time()

    if not entry or (now - entry.stamp) > ttl:
        result = func(*args, **kw)
        entry = cache.setdefault(key, Stamped(now, result))
    return entry.obj

Then the calling code might become:

# ttl is the number of seconds until a cached value expires
ttl = 60 * some_number_of_minutes_only_i_know

result = cached(ttl, xyz, 'foo', 'bar')

Now your caching can be as dynamic as needed, the other users need not know the details and the library function can stay simple. The caching behavior stays with the rest of the user logic.

We use a function name and argument tuple as a key for illustration purposes. You may need to index your functions differently — and again, with minimal local inline caching you can be maximally lazy about it.

Jan 19, 2010

Debugging with PyDev

This is surprisingly elusive, so here it is. After creating a debug/run target with Run/Debug as... -> Python run, add this variable in the Environment tab:


If you're debugging Django, add another:


You will also want to add --noreload to your runserver argument.