Feb 23, 2014

Euclidean rhythm generator in Haskell

So, Haskell! After a few days of astonished dabbling, the Euclidean Rhythm exercise had to be repeated. Again, this is beginner level code and we rely entirely on type inference.

The function breakdown is a little different this time. Two helper functions:

  • ezip: a hybrid of zip and outer join, takes two lists of different lengths and concatenates their elements pairwise;
  • efold: repeatedly applies the tail end of the sequence under construction to the head, using ezip, until either there are 3 or fewer subpatterns or the pattern is cyclic.

The entire solution:

euclidean k n = concat . efold $
                (replicate k [1]) ++ (replicate (n - k) [0])

ezip x [] = x
ezip [] x = x
ezip (x:xs) (y:ys) = (x ++ y) : ezip xs ys

efold xs
  | length xs <= 3 = xs
  | a == [] = xs
  | otherwise = efold $ ezip a b
  where (a, b) = partition (/= last xs) xs

And that's it! Notice how concise yet readable this is compared to an already terse Clojure version.

Similar to the Clojure example, it is not immediately obvious why a == [] is a terminal case. Empty a really means this: partition was unable to find different elements because they are identical -- the pattern is cyclic. A less efficient but more explicit way to write this would have been with a allSame function:

euclidean' k n = concat . efold' $
            (replicate k [1]) ++ (replicate (n - k) [0])

ezip' (x, []) = x
ezip' ([], x) = x
ezip' ((x:xs), (y:ys)) = (x ++ y) : ezip' (xs, ys)

allSame [] = True
allSame [x] = True
allSame (x:y:xs) = (x == y) && allSame (y:xs)

efold' xs
  | length xs <= 3 = xs
  | allSame xs = xs
  | otherwise = efold' . ezip' $ partition (/= last xs) xs

And, a test:

*Play> euclidean 3 5
*Play> euclidean 3 8

Jan 26, 2014

Euclidean rhythm generator in Clojure

My first Clojure function is a Euclidean Rhythm generator, and I felt like sharing it. I found a post doing a similar thing, albeit in LISP. They called it "exploratory programming", and so will I.

Why Euclidean rhythms and why Clojure? Well, Overtone! Having played with SuperCollider before and having developed a curiosity towards Clojure, Overtone seemed like a great segway into the latter.

For a while I have been wanting to experiment with quantifying, expressing and generating musically interesting rhythms. After a bit of research a recent paper turned up: The Euclidean Algorithm Generates Traditional Musical Rhythms. How cool is that? So I wanted to have my own implementation, as well as make a Clojure exercise out of it.

Euclidean rhythms, basically

A Euclidean rhythm is loosely defined as a sequence of beats and silences, where the beats are distributed as equidistantly as possible. With discrete and relatively short sequences this produces interestingly sounding results.

The rhythm is described as a function E(k, n) where k is the number of beats and n is the number of divisions (k < n). For example: E(3, 5) := [1 0 1 0 1].

The solution to ER works as follows, with examples:

  1. Start with a sequence of n 1's and (k - n) 0's:

    [1 1 1 0 0 0 0 0]
  2. Remove as many same elements from the tail, and append them to as many other elements in the front as we can:

    [[1 0] [1 0] [1 0] 0 0]
  3. Repeat as many times as is sensible:

    [[1 0 0] [1 0 0] [1 0]]
    [[1 0 0 1 0] [1 0 0]]
  4. When repetitions no longer produce a new flat sequence, stop:

    [1 0 0 1 0 1 0 0]


Using the powers of Clojure as we discover them, we can be a tiny bit more concise than Ruin & Wesen. The entire code is available as a Gist.

It feels like this should be a one-liner in Clojure, and maybe it can be. But for now we start modest (and maintain some early debuggability). This solution was discovered experimentally, better ones must exist out there so please do share.

The starting sequence is easy. This will generate a sequence like [[1] [1] [0] [0] ...].

(concat (repeat k [1]) (repeat (- n k) [0]))

Vectors or lists? So far I don't know! For the exercise it does not matter. Vectors are simpler to type, lists are what the core functions return. Seeing a mix of them seems normal.

We could start by popping elements off the end and so on, but let's not. A better way is to "zip" part of the "tail" with the head of the sequence. Then we need a way to get that "tail":

(defn split-seq [s]
  "Extract a tail of same elements: [1 1 0 0 0] -> [[1 1] [0 0 0]]"
  (let [l (last s)]
    (split-with #(not= l %) s)))

This says: walk the sequence as long as the element is not equal to the last one, then split. Now, the fun part: repeatedly zipping the sequence until we're nice and Euclidean.

I can has zip()?

Clojure has no direct equivalent of Python's zip, but map will accept multiple sequences. If the elements are themselves sequences and the function is concat, we have our zip:

tutorial.core> (map concat [[1] [2] [3]] [[4] [5] [6]])
((1 4) (2 5) (3 6))

If the sequences are of different length, map will stop at the end of the shortest one and ignore the rest (Python's map, on the other hand, will pad the shorter sequences with Nones). Lucky for us, this is exactly the behavior we need to satisfy step 2 of the solution. The length of the merged sequence will equal the number of elements that are being removed from the end.

Let's recurse!

Obviously, once our recombined list is down to 2 or 3 elements no new flat sequences will appear (if they do they will be shifted copies of each other -- prove that if you like). Using Clojure's multiple signatures let's define these as base cases:

(defn recombine
  ([a b] [a b])
  ([a b c] [a b c])

I.e., with only 2 or 3 arguments just return the sequence.

Note that we pass the sequence as unpacked arguments instead of the sequence object itself. The only reason is that we can use the multiple signature mechanism for base cases instead of a far less elegant if statement.

Now, the real work:

  ([a b c & more]
    (let [s (concat [a b c] more)
          [head tail] (split-seq s)
          recombined (map concat head tail)
          r-len (count recombined)]

let's get a little procedural for a moment:

  1. s is the reconstructed sequence in question
  2. tail is the "same element" tail, head is the rest of the sequence
  3. recombined is head and tail zipped together
  4. r-len is the length of the recombined part

There is a third base case: if the sequence becomes purely periodic (i.e. [1 0] [1 0] [1 0] [1 0]). According to our definition of split-seq, split-with will return all tail and no head. Let's account for that:

       (if (empty? head)

Empty head! That does happen. Finally, the recursive step.

         (apply recombine (concat
                           (drop r-len (drop-last r-len s))))))))

In other words, if a sequence is periodic just return it, otherwise construct a single pass of the recombined sequence (recombined plus the middle portion of the original sequence, r-len elements chopped off both ends) and recombine that.

Does it work?

tutorial.core> (recombine [1] [1] [0] [0])
[(1 0) (1 0)]
tutorial.core> (recombine [1] [1] [1] [0] [0])
[(1 0) (1 0) [1]]

Yes, it works.

Wrap up

(defn split-seq [s]
  "Extract a tail of same elements: [1 1 0 0 0] -> [[1 1] [0 0 0]]"
  (let [l (last s)]
    (split-with #(not= l %) s)))

(defn recombine
  ([a b] [a b])
  ([a b c] [a b c])
  ([a b c & more]
     (let [s (concat [a b c] more)
           [head tail] (split-seq s)
           recombined (map concat head tail)
           r-len (count recombined)]
       (if (empty? head)  ;; even pattern
         (apply recombine (concat
                           (drop r-len (drop-last r-len s))))))))

(defn E [k n]
  (let [seed (concat (repeat k [1]) (repeat (- n k) [0]))]
    (flatten (apply recombine seed))))

tutorial.core> (E 3 5)
(1 0 1 0 1)
tutorial.core> (E 3 8)
(1 0 0 1 0 0 1 0)


Jan 13, 2014

Adventurous metaclassing: automatic class splitting

td;dr: I had a problem that could only be solved by splitting a class into two separate superclasses and inheriting from both. This article describes a metaclass that achieves the same effect. Contrive, it's fun.

Let's write a metaclass that splits a superclass of a class in two, while demonstrating a rare legitimate use case for mixins.

Why would you ever want to? My use case arose from an uncooperative dynamic class.

Atom (the problem)

I was building a UI using Enaml, which internally relies on Atom (formerly Traits). Among other things, Atom is an observer framework that provides bindings for Enaml, allowing attribute value changes to automatically update the UI.

To use, you subclass Atom and declare class attributes initialized to field instances, just like an ORM.

from atom.api import Atom, Int

class Model(Atom):
    counter = Int()

The problem is that Atom expects all attributes to be declared fields. If the attribute is not a known field type, Atom will turn it into a read-only field. This means you cannot have a "normal" attribute alongside an "atomic" field, and vice versa.

class MixedModel(Atom):
    counter = Int()
    x = 0  # x is read-only

# subclassing won't help
class Extended(Model):
    x = 0  # ditto

    def __init__(self):
        self.x = 1  # error: object attribute 'x' is read-only
        self.y = 1  # error: object has no attribute 'y'

Well, that's annoying. But what if an atom was a mixin, would that work?

class ModelAtom(Atom):
    counter = Int()

class ModelBase(object):
    x = 0

class Model(ModelBase, ModelAtom):

# fingers crossed...
m = Model()
m.x = 5  # works!

Solution: mixins

So, a pure atom mixin can contain our atomic fields and we can use them in our impure object. We still need a "normal" class to co-inherit from (where the other attributes will live), and it cannot be object.

This frees us from having to make the whole class atomic, but writing new mixins to contain atom fields can get tedious. Assuming that modifying Atom itself is out of the question, what can we do to automate the generation of the necessary classes?

The Metaclass

Naturally, anytime we need to mess with the class creation it's time for metaclasses. In this case, however, we are not just creating a class. We need to create up to two new classes and make our new class inherit from them instead. In other words, we need to rewrite the inheritance tree.

We do this by dynamically creating new classes and injecting them into the class bases. For consistency and reference, we will also inject them into the module as if they were defined there. One of the new classes will get all the atom fields, and subclass Atom. The other class will be optional, in case our target class is a direct subclass of object. The target class will keep all the other attributes.

def split_dict(d, test):
    a, b = {}, {}
    for k, v in d.iteritems():
        if test(k, v):
            a[k] = v
            b[k] = v
    return a, b

from atom.catom import Member
from atom.api import AtomMeta, Atom
import inspect
class AtomizerMeta(type):
    def __new__(meta, class_name, bases, attrs):
        module = inspect.getmodule(inspect.stack()[1][0])

This will the metaclass of our new mixin base, Atomizer. The metaclass will run on Atomizer itself but we only need the magic to happen in its subclasses, so skip it:

        if module.__name__ == meta.__module__ and class_name == 'Atomizer':
            return super(AtomizerMeta, meta).__new__(meta, class_name, bases, attrs)

        # pluck out atom fields
        atom_attrs, basic_attrs = split_dict(attrs, lambda k,v: isinstance(v, Member))

atom_attrs contains only the atomic fields (subclasses of Member) and basic_attrs all the rest.

One of the bases will be Atomizer itself. Because we are replacing it with two new classes, let's remove it from bases:

        new_bases = tuple(b for b in bases if b != Atomizer)
        new_classes = ()

Create the two new classes:

        # allow use as both mixin and a sole superclass
        if not new_bases:
            basic_name = class_name + '_deatomized'
            basic_class = type(basic_name, new_bases, {})
            new_classes += ((basic_name, basic_class), )
            new_bases += (basic_class, )

        if atom_attrs:
            atom_name = class_name + '_atom'
            atom_class = type(atom_name, (Atom,), atom_attrs)
            new_classes += ((atom_name, atom_class), )
            new_bases += (atom_class, )

Later new_bases will become the bases for our target class. Note that we are covering two different cases: inheriting directly from Atomizer or having it as a mixin. In the latter case we already have a "normal" class to contain our "normal" attributes.

Finally, let's populate the module space and finish creating the target class.

        # inject into the containg module
        for n, c in new_classes:
            setattr(module, n, c)
            setattr(c, '__module__', module.__name__)  # just in case

        return type(class_name, new_bases, basic_attrs)

class Atomizer(object):
    '''Smart atomizer'''
    __metaclass__ = AtomizerMeta

we haz a win?

from atom.api import Atom, Int

class A(Atomizer):
    a = Int()
    b = 0

Yep. No need to bother with separate mixin classes.

Jan 05, 2014

Emacs tabbar tuning

I like tabbar-mode and the buffer cycling options it provides. I can cycle between buffers in a single group (especially useful when cycling scope is limited to tabs) and between different groups. This makes it much easier to manage many open buffers within different contexts (such as projects).

(require 'tabbar)
(tabbar-mode t)
(setq tabbar-cycle-scope 'tabs)

(global-set-key (kbd "s-{") 'tabbar-backward-group)
(global-set-key (kbd "s-}") 'tabbar-forward-group)
(global-set-key (kbd "s-[") 'tabbar-backward)
(global-set-key (kbd "s-]") 'tabbar-forward)


The grouping defaults are reasonable (by major mode name), but could be better. We'll use the default tabbar-buffer-groups-function from tabbar.el as a basis. This section contains the entire function budy.

(defun my-tabbar-buffer-groups ()
"Return the list of group names the current buffer belongs to.
Return a list of one element based on major mode."

*Process* group

The default grouping separates "process" buffers into their own group. This is a problem with flycheck, for example. When flycheck does its thing the buffer briefly becomes a "process" buffer, causing tabbar to yank it between groups. This makes for a very jittery tab bar. Let's fix that by removing the "Process" group altogether:

   (setq my-group-by-project nil)
    ;; ((or (get-buffer-process (current-buffer))
    ;;      ;; Check if the major mode derives from `comint-mode' or
    ;;      ;; `compilation-mode'.
    ;;      (tabbar-buffer-mode-derived-p
    ;;       major-mode '(comint-mode compilation-mode)))
    ;;  "Process"
    ;;  )

OK, better. Now our flychecked files stay grouped by major mode.

    ((member (buffer-name)
             '("*scratch*" "*Messages*"))
    ((eq major-mode 'dired-mode)
    ((memq major-mode
           '(help-mode apropos-mode Info-mode Man-mode))
    ((memq major-mode
             rmail-edit-mode vm-summary-mode vm-mode mail-mode
             mh-letter-mode mh-show-mode mh-folder-mode
             gnus-summary-mode message-mode gnus-group-mode
             gnus-article-mode score-mode gnus-browse-killed-mode))

Grouping by project

While working on multiple projects the tab bar quicly becomes unwieldy. I prefer to group mine by "project name:mode name" instead. I use projectile for getting the name of the project.

Also, for small projects I like to be able to group by project name alone. For this I defined a simple toggle variable my-group-by-project.

     ;; Return `mode-name' if not blank, `major-mode' otherwise.
     (let ((group 
            (if (and (stringp mode-name)
                     ;; Take care of preserving the match-data because this
                     ;; function is called when updating the header line.
                     (save-match-data (string-match "[^ ]" mode-name)))
              (symbol-name major-mode))))
       (if (projectile-project-p)
           (if my-group-by-project
             (format "%s:%s" (projectile-project-name) group))

Performance tuning

tabbar-mode calls tabbar-buffer-groups-function A LOT -- for each open buffer for every single keystroke. Since projectile-project-name is not a super fast function this will slow Emacs down. Assuming we never want buffers to change groups, we could quasi-memoize this function and cache group names per every project. We can do this with a pseudo-closure.

(defun my-cached (func)
  "Turn a function into a cache dict."
  (lexical-let ((table (make-hash-table :test 'equal))
                (f func))
    (lambda (key)
      (let ((value (gethash key table)))
        (if value
          (puthash key (funcall f) table))))))

;; evaluate again to clear cache
(setq cached-ppn (my-cached 'my-tabbar-buffer-groups))

(defun my-tabbar-groups-by-project ()
  (funcall cached-ppn (buffer-name)))

(setq tabbar-buffer-groups-function 'my-tabbar-groups-by-project)

Now, tabbar will be fetching a cached group name for existing buffers. To wipe the cache we recreate the closure by reevaluating.

Finally, wire the toggle (we must clear the cache for regrouping to take effect):

(defun my-toggle-group-by-project ()
  (setq my-group-by-project (not my-group-by-project))
  (message "Grouping by project alone: %s"
           (if my-group-by-project "enabled" "disabled"))
  (setq cached-ppn (my-cached 'my-tabbar-buffer-groups)))

Oct 14, 2013

Agentum: moving from ORM to path-based object selectors

Early in the development of Agentum the need for client-server separation became apparent, mostly because of how sucky GUI programming is in Python. I came up with a simple generic protocol to expose the state of the simulation over a web/network socket. The intent was to play around with a browser based UI and to open it up to anyone who wants to write their own GUI client.

A few specific design challenges arose from that:

  1. We need a generic way to describe simulation elements to the client so it knows how to draw them;
  2. We need a generic way to wire control input from the client to simulation elements;
  3. We need to watch the simulation object so we can notify the client when things change, automatically.
  4. In the not-so-distant future, when Agentum is grown up and distributed, we'll need a formal way of packaging and shipping object instances across computation nodes.

Also, this allows API access to a running simulation. You can hook up your own whatever and interact with Agentum!

The ORM way

Since I was already familiar with Django ORM and that pattern can satisfy the design goals easily, replicating it seemed like a good idea.

Starting with a simple example, this is how you might represent a grid square in the Heatbugs simulation:

class Square(object):
    heat = 0

The heat parameter loosely represents the square's temperature. It changes as the square is subject to dissipation forces and the heat radiated by the bugs.

We don't want to ship the entire state of the model to a remote GUI client every time something needs redrawing. Instead, we want to tell the client what changed — and we need to know that ourselves. Two ways of achieving that:

  1. Passive: compare the entire state of the model with a previous snapshot and detect changes.
  2. Active: hook into attribute access and handle changes as they happen.

Of the two, the "active" looks more attractive. Active Record fits the bill perfectly, and Django-style ORM is a great AR implementation.

Add some metaclass magic, and now we have something like this:

from model import Cell
from fields import Float

class Square(Cell):
    heat = Float(0)

The user code looks almost the same, but internally things are different:

  • Cell subclasses Model, Model overrides __setattr__() and has a metaclass that inspects field-type attributes.
  • Now we know Square has a field heat and it's a float. We can send that information to the client.
  • Now we know when square's heat changes whenever the user does square.heat = new_value. Again, we can send that to the client.

Immediate problem solved. But wait, there are more goodies!

A field class is a great place for any value processing code (serializing, validating, thinning, etc.), and we could cram a lot of metadata into the field declaration. We could describe things like:

  • Mapping values to colors
  • Quantization threshold for avoiding taking up bandwidth with insignificant changes
  • Allowable ranges (Django's min/max)
  • Allowable values for enums (Django's choices)
  • Exactly which fields need to propagate upstream
  • etc.

Marshalling values between different formats can be done with some elegance, even with complex types:

class Integer(Field):
    default = 0
    from_string = int

class List(Field):
    default = []

    def __init__(self, field, *args, **kw):
        Field.__init__(self, *args, **kw)
        self.from_string = lambda x: map(field.from_string, x.split())

dimensions = List(Integer, (0, 0))
dimensions = '10 10'

Useful for accepting human input and not having to type JSON by hand, for example.

The dilemma

This is a good pattern, what could go wrong? Not too much, if you only ever deal with top level class attributes of basic types. This is not always the case.

Implementing the Schelling Segregation Model requires agents of the same class but of different attributes, like red and blue turtles. Suppose you want global attributes that control their behavior at runtime. For example, all red turtles would share one tolerance level, and the blue ones another; and you want to be able to set different population densities separately.

With top level attributes you would be limited to something like:

class Schelling(Simulation):
    red_tolerance = 0.5
    red_density = 0.3
    blue_tolerance = 0.1
    blue_density = 0.6

But that's silly. What if '`you wanted to add green turtles? And how do you extract the color as an independent attribute?

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

That's how. But now we have a problem: writing field descriptors for arbitrarily nested container objects is a hairy proposition. There is no good way to track internal changes to an object like this, not without Too Much Magic™. Nor there is a good way to use this that does not begin to resemble Java generics. That would be awful.

Thinking about this led me to reexamine the fitness of ORM for Agentum.

ORM is for schemas and databases

Even though the mechanics of Django-style ORM give us useable results, the purposes are quite different. ORM builds representations of database objects and ensures strict adherence to a schema. On the contrary, Agentum objects live in the simulation runtime, the model implementation is entirely up to the user and there is no database backend with which to enforce integrity.

GUI is optional

Agentum makes very few assumptions.

Assumption 1: The primary goal of modeling is writing a model that actually works and computes something interesting.

Our job is to provide an environment where this can be done quickly and cleanly. Where you go from there can vary: some may be content with tweaking parameters in the source and rerunning the simulation while others will want to generate pictures and interact with the simulation in various ways, including programmatically.

Rephrasing that, the communication with the GUI client and the client itself are secondary. Without the client the model will still compute.

We're smart, let's guess

class Square(Cell):
    heat = 0.0

OK, we can tell that Square has an attribute heat, it's a float and its default value is 0. That alone is enough to construct a field descriptor.

Of course, there are still things like metadata and field exclusions, but the base case is covered by simple introspection and guessing. When we do need more metadata we can provide it in addition rather than forcing field declarations everywhere.

In effect, this "additional" metadata will be an optional layer in the model declaration.

Path- and rule-based object selectors

Assumption 2: To the outside world an Agentum object is a data structure composed of plain values.

In other words, something isomorphic to a JSON object. In fact, our entire simulation can be represented this way. Every object has an ID and every attribute, however deeply nested, has a path to it:


Using this, we can describe all the things as well. Where there is a path, there is a wildcard:

agent_params.*.tolerance = 0.1
space.cells.*.friend_ratios_by_color.* = 0.1
# haha, misery

This is powerful stuff. Using selectors, our model objects could look like this:

class Schelling(Simulation):
    agent_params = {'red': {'tolerance': 0.5, 'density': 0.3},
                    'blue': {'tolerance': 0.1, 'density': 0.6}}

    fields = {'agent_params.*.density': Float(min=0, max=1)}

Or even:

    fields['agent_params.*.density'].range = (0,1)

Of course, there is still a question of hooking into value assignments. Stay tuned.

Oct 05, 2013

JavaScript gotcha's for Pythonistas

Someone has to build the UI and today it's you.

In many ways JS is similar to Python. So similar than when it's not, it might surprise you. Below are some things I found intersting in two solid months of javascripting.

Object declarations and attribute access

In JS the closest thing to Python's dicts are Objects.

// this is the same in both JS and Python
mything = {'foo': 1, 'bar': {'x': 5}}

// this is not
mything = {foo: 1, bar: {x: 5}}

// but both are the same in JS!

Python will take your key literals and treat them as variable names. JavaScript will turn them into strings and use them as key names! What's more fun, it only does that for keys, not the values.

            foo = 1
python:     {foo: foo} => {1: 1}
javascript: {foo: foo} => {'foo': 1}

There is an upside to that — you don't have to use quotes so much:

// instead of this
x = {'foo': 1};
x['bar'] = 2;

// just do this
x = {foo: 1};
x.bar = 2;

And a downside — using variables as key names requires a little dance:

mykey = 'foo';

x = {};
x[mykey] = 1;

Attribute access

JS is very relaxed about that, there is the famed dotted notation. Python has no built-in equivalent. The upshot is that any nested attribute can be accessed by a single dotted selector:

// instead of
mything['foo']['bar'][0]['x'] = 5;

// simply do
mything.foo.bar.0.x = 5

Notice this works for array indexes too. Of course, variable key names will still require [].

Unfortunately, this has to be a literal. There is no built-in way to do mything['foo.bar.0.x'].

Missing attributes

(updated 10.13)

Unlike Python, JS always returns attribute values, even the missing ones. The missing attributes miraculously have values, and their value is undefined. This can be handy:

// naively in Python:
value = False
if 'maybe_there' in obj:
    value = obj['maybe_there']
if value == 2: ...

// skillfully in Python:
if obj.get('maybe_there', None) == 2: ...

// in JS:
if (obj.maybe_there == 2) { ... };

Of course, you can only take this one level deep:

> obj = {}
> obj.foo
> obj.foo.bar
TypeError: Cannot read property 'bar' of undefined

Object comparisons

A colleague pointed this out to me, objects do not compare:

> {'a': 1} == {'a': 1}
> {'a': 1} === {'a': 1}


for (var i in things) {

Despite how straightforward and intuitive that looks, it is not.

If things is an Object, i will iterate over key names. Ok, fine.

If things is an Array, i should iterate over elements, right? Wrong. i will iterate over array indexes.

If i iterates over array indexes those should be integers, right? Sometimes. Sometimes it's a string representing an integer!

Handle this however you like, just beware.


JavaScript can't handle the truth

Oh yeah. Your if statements are not doing what you expect and you begin to wonder why.

> Boolean('')

Good, we were expecting that. That means our habit carries over, right?

> Boolean({})
> Boolean([])



Does Underscore have anything handy? isEmpty() looks useful:

> _.isEmpty('')
> _.isEmpty([])
> _.isEmpty({})
> _.isEmpty(false)

Hey, this is good! Something consistent we can use. Or is it?

> _.isEmpty(true)

Sad panda.


And of course: https://www.destroyallsoftware.com/talks/wat

Example git forkflow: branch hopping

I wanted to show my mercurian friends around git, and what better way to do that than by real examples.

This morning I wanted to persist the changes I've been making to pelican-mockingbird theme for use in my own blog. Increased verbosity for illustration purposes.

$ git status
# Not currently on any branch.
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#   modified:   static/css/pygments.css
#   modified:   static/css/screen.css
#   modified:   templates/article_stub.html
#   modified:   templates/base.html
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#   templates/disqus.html
no changes added to commit (use "git add" and/or "git commit -a")

You may notice I'm "not on any branch" (also called detached head state). This is because I am working inside a submodule of pelican-themes and the submodule is pinned to a particular commit.

First, I fork pelican-mockingbird on Github and add my new forked repo as a remote.

$ git remote add mine git@github.com:unthingable/pelican-mockingbird.git

Since I'm not on a branch I need to get on one. I want it to be the master from my new fork.

First, fetch the state of the new origin:

$ git fetch mine
From github.com:unthingable/pelican-mockingbird
 * [new branch]      master     -> mine/master

It found one remote branch master and fetched it under a local alias mine/master. Now I attempt to checkout the remote branch. This is not the branch I'll be working in, but it is a quick way to bring the working dir into the most current state.

$ git checkout mine/master
error: Your local changes to the following files would be overwritten by checkout:
Please, commit your changes or stash them before you can switch branches.

OK, apparently I was missing some recent commits and now the incoming changes are conflict with my local changes. Before I risk losing my work, it is time for salvation:

$ git stash
Saved working directory and index state WIP on (no branch): 2f4c1b6 update screenshot
HEAD is now at 2f4c1b6 update screenshot

Now I can get current:

$ git checkout mine/master
Previous HEAD position was 2f4c1b6... update screenshot
HEAD is now at fa46a80... Merge pull request #5 from kevmo314/master

So far so good. Let's bring my changes back in:

$ git stash pop
Auto-merging templates/base.html
Auto-merging templates/article_stub.html
CONFLICT (content): Merge conflict in templates/article_stub.html
Auto-merging static/css/screen.css
Auto-merging static/css/pygments.css
CONFLICT (content): Merge conflict in static/css/pygments.css

Even though I popped the stash, the stashed changes is still there. At this point (or any point in the future) I can decide to scrap what I'm doing, go back to the original commit I started working from (2f4c1b6) and reapply my stashed changes conflict-free. This would fully restore the state I had in the beginning of the post.

I decide to keep the new changes and fix the conflicts. Now my changes are in a happy state and I am ready to make a new commit atop fa46a80 (the last commit on remote master).

$ git status
# Not currently on any branch.
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#   modified:   static/css/screen.css
#   modified:   templates/base.html
# Unmerged paths:
#   (use "git reset HEAD <file>..." to unstage)
#   (use "git add/rm <file>..." as appropriate to mark resolution)
#   both modified:      static/css/pygments.css
#   both modified:      templates/article_stub.html
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#   templates/disqus.html

Let's tell git my conflicts are resolved, and also add the new template:

$ git add templates/ static/

(channeling Gallagher) Can it be that easy? Yes, it's that easy!

$ git status
# Not currently on any branch.
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#   modified:   static/css/pygments.css
#   modified:   static/css/screen.css
#   modified:   templates/article_stub.html
#   modified:   templates/base.html
#   new file:   templates/disqus.html

Now we're ready to commit and push. But wait, we're still not on any branch. Let's take our current state and make a new branch out of it. I don't want to touch my master just yet, just in case I bork something — after all, I have been doing mercurial for a few months now...

$ git branch
* (no branch)
$ git checkout -b new-master
M   static/css/pygments.css
M   static/css/screen.css
M   templates/article_stub.html
M   templates/base.html
A   templates/disqus.html
Switched to a new branch 'new-master'
$ git commit -am 'fix.'
[new-master e542d31] fix.
 5 files changed, 125 insertions(+), 82 deletions(-)
 rewrite static/css/pygments.css (98%)
 create mode 100644 templates/disqus.html
$ git log
commit e542d311cf1a6ed318c1807a118a7f4a5ce93f80
Author: unthingable <me@example.com>
Date:   Tue Nov 5 10:08:43 2013 -0800


commit fa46a8024ceb821a8717ed36f7b9d2a6a06d2060
Merge: decd2d4 7ea3637
Author: william light <wrl@illest.net>
Date:   Sun Oct 13 06:02:45 2013 -0700

    Merge pull request #5 from kevmo314/master

    use more standardized strftime formats

Looking good! Now I want to take new-master and push to my github master without renaming. Can it be that easy?

$ git push mine new-master master
Counting objects: 18, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (10/10), 2.28 KiB, done.
Total 10 (delta 5), reused 0 (delta 0)
To git@github.com:unthingable/pelican-mockingbird.git
 * [new branch]      new-master -> new-master

Oops, it's been a while and I forgot the proper syntax. Now there is a branch called new-master on github. Let's fix and clean up:

$ git push mine new-master:master
Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:unthingable/pelican-mockingbird.git
   fa46a80..e542d31  new-master -> master
$ git push mine :new-master
To git@github.com:unthingable/pelican-mockingbird.git
 - [deleted]         new-master

All done. I still have a local master that I don't need anymore, so one more thing to tidy up:

$ git branch -av
  master              fa46a80 Merge pull request #5 from kevmo314/master
* new-master          51e3737 readme
  remotes/mine/master 51e3737 readme
$ git branch -d master
Deleted branch master (was fa46a80).
$ git branch --move master
$ git branch -v
* master 51e3737 readme

Oct 01, 2013

The setup, part 2

Life has changed a little since the last report.

  • A maxed out MacBook Air has replaced the Pro and I love it. The screen hinge is weak, the screen is reflective — those are annoying, yes. But overall, it weighs nothing, runs a VM effortlessly and I can work unplugged the whole day.
  • sshfs and 10.8 are no longer friends. Sublime cannot reliably detect file changes anymore, so instead the guest OS now mounts a shared folder.
  • Having to use mercurial sucks and may deserve a separate post.

May 30, 2013

The setup

Over the past couple of years I have had the fortune to pick up some great habits from great people. It is time to share an updated setup. These are the things my work-life currently depends on.

  • Python

  • Life environment: OSX, iMac, MacBook Pro, and recently back to iPhone.

  • Dev environment: VirtualBox. I still like free, and VBox works just fine.

  • Dev OS: Ubuntu Server. No more esoteric nonsense like Arch. The skinnier Server Edition is preferred as I need very little graphics capability from my Linux. Cannot stand Linux GUIs.

    Now, how it all works together:

    • Linux runs the code and manages packages. All the dev stuff stays there. Linux runs in headless mode.
    • OSX runs the editor (Sublime Text), terminal (iTerm), browsers, etc. and keeps my senses happy with beautiful fonts.
  • sshfs: the sanest way I know to share files between host and guest OSes. I keep a virtual directory mounted locally on OSX and point the editor there.

  • X11: always ready when I do need to bring up a GUI tool over a tunnel (I always ssh -X into my virtual machine), which are:

  • git gui, gitk: if they are not part of your git-fu already, you are missing out
  • tig: very useful for a quick history browse
  • IPython+Notebook: I often run my code as I'm testing it, and IPython is the way to interact with the interpreter. As the experiment grows or whenever working with data plots, firing up Notebook is worth the hassle.
  • nose+mock: I finally learned to stop worrying and love the tests. Yes, tests are always worth having.
  • Sublime Text: this probably deserves a separate post. In short, breaking away from the IDE land has been a happy change.
  • pianobar

May 21, 2013

Support your local caching

Suppose you had an expensive function xyz() whose output does not change often. Caching sounds appropriate. You may be inclined to memoize the function at the source, maybe by adding a decorator.

def xyz(a, b):

result = xyz('foo', 'bar')

You have just added behavior on which downstream users are likely to end up depending; and imposed a design decision on them. This is fine in most cases.

But perhaps not all users need xyz() cached. Perhaps the results are only good for some time but only the user knows exactly how long. Perhaps the caching behavior will depend on factors you do not even know ahead of time. Now you have a problem: making a simple decision that works for everyone is not trivial any more.

To avoid a contrived dependency configuration, consider caching at the usage point — the function call itself — and use it where you need it.

Stamped = namedtuple('Stamped', 'stamp obj')
cache = {}

def cached(ttl, func, *args, **kw):
    'Simple inline cache for function calls'

    key = (func.__name__,) + args
    entry = cache.get(key, None)
    now = time.time()

    if not entry or (now - entry.stamp) &gt; ttl:
        result = func(*args, **kw)
        entry = cache.setdefault(key, Stamped(now, result))
    return entry.obj

Then the calling code might become:

# ttl is the number of seconds until a cached value expires
ttl = 60 * some_number_of_minutes_only_i_know

result = cached(ttl, xyz, 'foo', 'bar')

Now your caching can be as dynamic as needed, the other users need not know the details and the library function can stay simple. The caching behavior stays with the rest of the user logic.

We use a function name and argument tuple as a key for illustration purposes. You may need to index your functions differently — and again, with minimal local inline caching you can be maximally lazy about it.

Next → Page 1 of 2