This is a copy of this article, kept for posterity on my blog.
I gave a talk about Wrek at Code BEAM SF last month. You can view it here.
Wrek is an Erlang library I wrote for concurrently executing task dependency graphs. It’s intended purpose is to run a set of pre-defined tasks that have a partial ordering between them. In this post, I explain why I wrote wrek, what it can be used for, and how to use it.
Wrek emerged as the result of two distinct forces. First, my amateurish enjoyment of graph theory made me try to see graphs wherever I could, laying the conceptual foundation for this library. Second, I realized that a project I was working on at Adgear would benefit from such a library, causing me to finally start writing Wrek.
The graph is a pervasive data structure in computing. When I learned about graph algorithms in school, I was amazed by their broad applications. Graphs play essential roles in compilers and build systems. Various graph algorithms form the backbone of how we communicate with each other over the internet.
Our everyday lives tend to be filled with lists. We have to-do lists, shopping lists, recipes, checklists, instructions, and much, much more. One day, I realized that some of these lists are deceiving. Some of these lists are actually graphs hiding in list’s clothing; these lists are topological orderings of directed acyclic graphs. Two of the more obvious cases of this are to-do lists and recipes. You can’t send a letter that you haven’t written yet, nor can you cook spaghetti if you haven’t boiled a pot of water.
Once the sneaky graphs were discovered, I spent some time thinking about how to benefit from treating various lists like the DAGs that they secretly are. The clearest analogies between these lists and DAGs were to-do lists and recipes. These very closely resembled dependency graphs. Unlike in their list representation, these dependency graphs show which vertices (individual tasks) can be executed concurrently. Sometimes this is obvious (e.g: I can chop vegetables while I wait for the pot of water to boil! I can begin preparing a second baking sheet of cookies while the first batch is in the oven!), but I had my hopes that the act of reformulating lists as dependency graphs could expose more opportunities for concurrency.
Boil water -- Add pasta -- Cook pasta --.
\
Purée tomatoes --. \
\ \
Chop vegetables -- Combine in saucepan -- Simmer sauce -- Combine pasta and sauce -- Serve
/
Add spices --'
Edges between vertices in the above graph express a partial ordering. Vertices with no path between them can be executed concurrently. The relationship between vertices in the graph reflects the truth in the kitchen. We can cook the pasta while our sauce is simmering. It doesn’t matter whether we chop vegetables or purée tomatoes first; they both need to be done if we’re going to make any sauce.
Despite my best efforts, I am still a disaster in the kitchen. As a means of changing this inconvenient fact, I thought it would be fun to try representing recipes as directed graphs, with extra data like how long each step can be expected to take. This sat in my list of ‘someday projects’ for a long time, and I was only reminded of it when I begin to think about a project I was working on at $JOB.
One of the projects I accomplished last year at Adgear was removing an expensive computation that was taking place on each of our already maxed-out edge servers, and replaced it with a system that performed the expensive computation on a single machine, then distributed the result. The system worked when nobody touched it, but it was the programming equivalent of twigs and duct tape; cron jobs and shell scripts. This system continued to be painful to use, and more examples of expensive computations being done on CPU-starved machines were cropping up. At this point, it made sense start writing a more robust system.
These expensive computations decomposed nicely into lists of steps to be
performed. Fetch this data, transform it, transform it some more, send some of
the data to this group of servers, send this other data to some other group of
servers. Shell scripts are pretty good at encoding these pipelines, so it was an
acceptable choice at the time of implementation. After a while, it dawned on me
that these lists of steps were not really lists; they were dependency graphs. I
tried to expose the latent concurrency in the shell scripts by spicing them up
with some background jobs and waitpid
, but it was decided that it would make
more sense to switch to Erlang and benefit from all that OTP has to offer.
wrek
ed(I’m really sorry. I couldn’t resist.)
Wrek accepts dependency graphs like the above spaghetti recipe as input, and
executes each vertex. The nature of concurrently executing actions as soon as
they are able to execute is generic to any problem that can be represented by
a dependency graph. The structure of a dependency graph, as well as the tasks
involved in each of the graph’s vertices are specific to the user’s
wishes. Following the same generic/specific split as OTP; the generic portion of
this class of problems is solved by wrek
and wrek_vert
modules. The specific
portion is provided by the user in the form of an Erlang map describing each
vertex in the graph, and a set of callback modules that implement the
wrek_vert
behaviour.
The wrek_vert
behaviour consists of a single callback run/2
, where the first
argument is a list of arguments to be sent to the callback function, and the
second argument is the ID of a process which can provide information generated
by other vertices. The expected result of this callback function is either {ok, Any :: any()}
or {error, Reason :: any()}
. If the callback function succeeds,
Any
will be taken by wrek and made available to other wrek_vert
processes. If the callback function crashes or returns an error, the entire
graph will be shut down.
Following the contrived example above, let’s set about using wrek to make pasta. Of course, our program won’t really make pasta, but it’s output ought to fool somebody.
Looking at the above graph, each step seems to do one of three things:
Let’s go ahead write some wrek_vert
s for each of those actions. If you don’t
feel like following along in a text editor, the full code can be found
here.
-module(cook_add).
-behaviour(wrek_vert).
-export([run/2]).
run([Ingredient, Quantity], _Pid) ->
io:format("adding ~s. amount: ~s.~n", [Ingredient, Quantity]),
{ok, #{added => [{Ingredient, Quantity}]}}.
That’s all for cook_add
. It prints a message, then produces a
map with a key added
whose value is a proplist with a single pair.
-module(cook_heat).
-behaviour(wrek_vert).
-export([run/2]).
run([Verb, Noun], _Pid) ->
io:format("~ping ~p.~n", [Verb, Noun]),
{ok, #{}}.
cook_heat
is also pretty short. It’s also very abstract. It could be used to
print a message about Verb
ing any Noun
, not just cooking ingredients!
Our final callback module is a little longer, because it does a little bit more than printing a message.
-module(cook_combine).
-behaviour(wrek_vert).
-export([run/2]).
run([Ingredients, Vessel], Pid) ->
Fun = fun(Step, Acc) ->
Stuff = wrek_vert:get(Pid, Step, added),
io:format("combining ~p with ~p in ~p.~n", [Stuff, Acc, Vessel]),
Stuff ++ Acc
end,
Stuff = lists:foldl(Fun, [], Ingredients),
io:format("~p now contains: ~p.~n", [Vessel, Stuff]),
{ok, #{added => Stuff}}.
Ingredients
is expected to be a list of vertex names. We finally use our
parent process’s Pid
as an argument to wrek_vert:get/3
. This lets us consume
the data produced by the cook_add
callback module. After combining everything,
we returns a new collection of ingredients.
Alright! We’re nearly done describing the specific portion of our problem! The last step is to represent our dependency graph in terms of these callback modules and the arguments we want to pass to them.
-module(wrek_example).
-export([make_pasta/0]).
make_pasta() ->
application:ensure_all_started(wrek),
Graph = #{
tomatoes => #{
module => cook_add,
args => ["pureed tomatoes", "1 can"],
deps => []
},
vegetables => #{
module => cook_add,
args => ["chopped vegetables", "lots"],
deps => []
},
spices => #{
module => cook_add,
args => ["spices", "to taste"],
deps => []
},
saucepan => #{
module => cook_combine,
args => [[tomatoes, vegetables, spices], saucepan],
deps => [tomatoes, vegetables, spices]
},
simmer_sauce => #{
module => cook_heat,
args => [simmer, sauce],
deps => [saucepan]
},
boil_water => #{
module => cook_heat,
args => [boil, water],
deps => []
},
add_pasta => #{
module => cook_add,
args => ["pasta", "1 handful"],
deps => [boil_water]
},
cook_pasta => #{
module => cook_heat,
args => [cook, pasta],
deps => [add_pasta]
},
mix_pasta_with_sauce => #{
module => cook_combine,
args => [[saucepan, add_pasta], saucepan],
deps => [simmer_sauce, cook_pasta]
}
},
wrek:start(Graph).
That’s an eyeful! What we’ve done here is created an Erlang map whose keys represent the names of vertices in our original dependency graph, and whose values are maps that specify a callback module, arguments to pass to the callback module, as well as any dependencies the vertex may have. I congratulate those of you who noticed that we are never straining the pasta; I did confess to being a disaster in the kitchen. I promise I learned my lesson.
Alright, we’re done coding! Let’s start up a shell and make some pasta!
$ rebar3 shell
[...]
1> wrek_example:make_pasta().
adding spices. amount: to taste.
adding puréed tomatoes. amount: 1 can.
adding chopped vegetables. amount: lots.
adding pasta. amount: 1 handful.
{ok,<0.133.0>}
2> boiling water.
cooking pasta.
combining [{"puréed tomatoes","1 can"}] with [] in saucepan.
combining [{"chopped vegetables","lots"}] with [{"puréed tomatoes","1 can"}] in saucepan.
combining [{"spices","to taste"}] with [{"chopped vegetables","lots"},
{"puréed tomatoes","1 can"}] in saucepan.
saucepan now contains: [{"spices","to taste"},
{"chopped vegetables","lots"},
{"puréed tomatoes","1 can"}].
simmering sauce.
combining [{"spices","to taste"},
{"chopped vegetables","lots"},
{"puréed tomatoes","1 can"}] with [] in saucepan.
combining [{"pasta","1 handful"}] with [{"spices","to taste"},
{"chopped vegetables","lots"},
{"puréed tomatoes","1 can"}] in saucepan.
saucepan now contains: [{"pasta","1 handful"},
{"spices","to taste"},
{"chopped vegetables","lots"},
{"puréed tomatoes","1 can"}].
It isn’t pretty, but neither is most of my cooking. However, we’ve just used
wrek
to teach Erlang how to concurrently prepare pasta! If you run
wrek_example:make_pasta()
over and over, you may notice that the order of the
output changes, but never enough that the steps followed in order don’t produce
valid pasta.
I believe that Wrek is useful for more than pretending to make pasta. I used Wrek at $JOB to concurrently execute various small shell scripts, collect their output, and pass up to Erlang through and down to other small shell scripts.
Wrek also contains features that I haven’t covered in this article. For
instance, you can pass a gen_event
process to wrek:start/2
to get Wrek to
call gen_event:notify/2
when various events take place. This gives users a way
to monitor the execution of their graphs. Combined with Sergey Aleynikov’s
erlexec
library, you can collect stdout
and stderr
from any shell command
you run in a vertex using wrek_exec:exec/4
. I invite interested readers to
take a peek at the source code for Wrek.
Wrek is still a very young project, and I am still inexperienced at writing libraries. I am virtually certain that the API is going to go through significant changes in the near future. I believe this is for the best, because the whole process of specifying callback modules and passing data between vertices can be improved dramatically. Here are some of the ideas I have for improving Wrek:
I had a really nice time writing Wrek, and I’m excited to be using it in production at Adgear. I am also eagerly looking forward to presenting Wrek at Code BEAM SF in March 2018.
Thank you very much for reading!
P.S: As I mentioned above, this is a copy of a post that was published earlier in the year. As a result, the above paragraph doesn’t make much sense.