An Introduction to Asynchronous Programming and Twisted

Mar 11th, 2010 | Filed under Blather, Programming, Python, Software

Part 13: Deferred All The Way Down

This continues the introduction started here. You can find an index to the entire series here.

Introduction

Recall poetry client 5.1 from Part 10.The client used a Deferred to manage a callback chain that included a call to a poetry transformation engine. In client 5.1, the engine was implemented as a synchronous function call implemented in the client itself.

Now we want to make a new client that uses the networked poetry transformation service we wrote in Part 12. But here’s the wrinkle: since the transformation service is accessed over the network, we’ll need to use asynchronous I/O. And that means our API for requesting a transformation will have to be asynchronous, too. In other words, the try_to_cummingsify callback is going to return a Deferred in our new client.

So what happens when a callback in a deferred’s chain returns another deferred? Let’s call the first deferred the ‘outer’ deferred and the second the ‘inner’ one. Suppose callback N in the outer deferred returns the inner deferred. That callback  is saying “I’m asynchronous, my result isn’t here yet”. Since the outer deferred needs to call the next callback or errback in the chain with the result, the outer deferred needs to wait until the inner deferred is fired. Of course, the outer deferred can’t block either, so instead the outer deferred suspends the execution of the callback chain and returns control to the reactor (or whatever fired the outer deferred).

And how does the outer deferred know when to resume? Simple — by adding a callback/errback pair to the inner deferred. Thus, when the inner deferred is fired the outer deferred will resume executing its chain. If the inner deferred succeeds (i.e., it calls the callback added by the outer deferred), then the outer deferred calls its N+1 callback with the result. And if the inner deferred fails (calls the errback added by the outer deferred), the outer deferred calls the N+1 errback with the failure.

That’s a lot to digest, so let’s illustrate the idea in Figure 28:

Figure 28: outer and inner deferred processing

Figure 28: outer and inner deferred processing

In this figure the outer deferred has 4 layers of callback/errback pairs. When the outer deferred fires, the first callback in the chain returns a deferred (the inner deferred). At that point, the outer deferred will stop firing its chain and return control to the reactor (after adding a callback/errback pair to the inner deferred). Then, some time later, the inner deferred fires and the outer deferred resumes processing its callback chain. Note the outer deferred does not fire the inner deferred itself. That would be impossible, since the outer deferred cannot know when the inner deferred’s result is available, or what that result might be. Rather, the outer deferred simply waits (asynchronously) for the inner deferred to fire.

Notice how the line connecting the callback to the inner deferred in Figure 28 is black instead of green or red. That’s because we don’t know whether the callback succeeded or failed until the inner deferred is fired. Only then can the outer deferred decide whether to call the next callback or the next errback in its own chain.

Figure 29 shows the same outer/inner deferred firing sequence in Figure 28 from the point of view of the reactor:

Figure 29: the thread of control in Figure 28

Figure 29: the thread of control in Figure 28

This is probably the most complicated feature of the Deferred class, so don’t worry if you need some time to absorb it. We’ll illustrate it one more way using the example code in twisted-deferred/defer-10.py. That example creates two outer deferreds, one with plain callbacks, and one where a single callback returns an inner deferred. By studying the code and the output you can see how the second outer deferred stops running its chain when the inner deferred is returned, and then starts up again when the inner deferred is fired.

Client 6.0

Let’s use our new knowledge of nested deferreds and re-implement our poetry client to use the network transformation service from Part 12. You can find the code in twisted-client-6/get-poetry.py. The poetry Protocol and Factory are unchanged from the previous version. But now we have a Protocol and Factory for making transformation requests. Here’s the transform client Protocol:

class TransformClientProtocol(NetstringReceiver):

    def connectionMade(self):
        self.sendRequest(self.factory.xform_name, self.factory.poem)

    def sendRequest(self, xform_name, poem):
        self.sendString(xform_name + '.' + poem)

    def stringReceived(self, s):
        self.transport.loseConnection()
        self.poemReceived(s)

    def poemReceived(self, poem):
        self.factory.handlePoem(poem)

Using the NetstringReceiver as a base class makes this implementation pretty simple. As soon as the connection is established we send the transform request to the server, retrieving the name of the transform and the poem from our factory. And when we get the poem back, we pass it on to the factory for processing. Here’s the code for the Factory:

class TransformClientFactory(ClientFactory):

    protocol = TransformClientProtocol

    def __init__(self, xform_name, poem):
        self.xform_name = xform_name
        self.poem = poem
        self.deferred = defer.Deferred()

    def handlePoem(self, poem):
        d, self.deferred = self.deferred, None
        d.callback(poem)

    def clientConnectionLost(self, _, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)

    clientConnectionFailed = clientConnectionLost

This factory is designed for clients and handles a single transformation request, storing both the transform name and the poem for use by the Protocol. The Factory creates a single Deferred which represents the result of the transformation request. Notice how the Factory handles two error cases: a failure to connect and a connection that is closed before the poem is received. Also note the clientConnectionLost method is called even if we receive the poem, but in that case self.deferred will be None, thanks to the handlePoem method.

This Factory class creates the Deferred that it also fires. That’s a good rule to follow in Twisted programming, so let’s highlight it:

In general, an object that makes a Deferred should also be in charge of firing that Deferred.

This “you make it, you fire it” rule helps ensure a given deferred is only fired once and makes it easier to follow the flow of control in a Twisted program.

In addition to the transform Factory, there is also a Proxy class which hides the details of making the TCP connection to a particular transform server:

class TransformProxy(object):
    """
    I proxy requests to a transformation service.
    """

    def __init__(self, host, port):
        self.host = host
        self.port = port

    def xform(self, xform_name, poem):
        factory = TransformClientFactory(xform_name, poem)
        from twisted.internet import reactor
        reactor.connectTCP(self.host, self.port, factory)
        return factory.deferred

This class presents a single xform() interface that other code can use to request transformations. So that other code can just request a transform and get a deferred back without mucking around with hostnames and port numbers.

The rest of the program is unchanged except for the try_to_cummingsify callback:

    def try_to_cummingsify(poem):
        d = proxy.xform('cummingsify', poem)

        def fail(err):
            print >>sys.stderr, 'Cummingsify failed!'
            return poem

        return d.addErrback(fail)

This callback now returns a deferred, but we didn’t have to change the rest of the main function at all, other than to create the Proxy instance. Since try_to_cummingsify was part of a deferred chain (the deferred returned by get_poetry), it was already being used asynchronously and nothing else need change.

You’ll note we are returning the result of d.addErrback(fail). That’s just a little bit of syntactic sugar. The addCallback and addErrback methods return the original deferred. We might just as well have written:

        d.addErrback(fail)
        return d

The first version is the same thing, just shorter.

Testing out the Client

The new client has a slightly different syntax than the others. If you have a transformation service running on port 10001 and two poetry servers running on ports 10002 and 10003, you would run:

python twisted-client-6/get-poetry.py 10001 10002 10003

To download two poems and transform them both. You can start the transform server like this:

python twisted-server-1/tranformedpoetry.py --port 10001

And the poetry servers like this:

python twisted-server-1/fastpoetry.py --port 10002 poetry/fascination.txt
python twisted-server-1/fastpoetry.py --port 10003 poetry/science.txt

Then you can run the poetry client as above. After that, try crashing the transform server and re-running the client with the same command.

Wrapping Up

In this Part we learned how deferreds can transparently handle other deferreds in a callback chain, and thus we can safely add asynchronous callbacks to an ‘outer’ deferred without worrying about the details. That’s pretty handy since lots of our functions are going to end up being asynchronous.

Do we know everything there is to know about deferreds yet? Not quite! There’s one more important feature to talk about, but we’ll save it for Part 14.

Suggested Exercises

  1. Modify the client so we can ask for a specific kind of transformation by name.
  2. Modify the client so the transformation server address is an optional argument. If it’s not provided, skip the transformation step.
  3. The PoetryClientFactory currently violates the “you make it, you fire it” rule for deferreds. Refactor get_poetry and PoetryClientFactory to remedy that.
  4. Although we didn’t demonstrate it, the case where an errback returns a deferred is symmetrical. Modify the twisted-deferred/defer-10.py example to verify it.
  5. Find the place in the Deferred implementation that handles the case where a callback/errback returns another Deferred.

Over and Out Rag

Mar 7th, 2010 | Filed under Music, Recordings

Here’s a rough cut of the only song on the Hanson book I really like.

Over and Out Rag

Book: The Art of Contemporary Travis Picking

Mar 7th, 2010 | Filed under Books

This is mainly geared towards playing guitar as an accompaniment to singing. Only a few of the songs are fun to play by themselves.

Book: Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Mar 1st, 2010 | Filed under Books

The Big Book of Enterprise Messaging.

Book: Beat the Reaper: A Novel

Feb 19th, 2010 | Filed under Books

Terrifically fun novel about a mob hit man turned doctor.

An Introduction to Asynchronous Programming and Twisted

Feb 7th, 2010 | Filed under Blather, Programming, Python, Software

Part 12: A Poetry Transformation Server

This continues the introduction started here. You can find an index to the entire series here.

One More Server

Alright, we’ve written one Twisted server so let’s write another, and then we’ll get back to learning some more about Deferreds.

In Parts 9 and 10 we introduced the idea of a poetry transformation engine. The one we eventually implemented, the cummingsifier, was so simple we had to add random exceptions to simulate a failure. But if the transformation engine was located on another server, providing a network “poetry transformation service”, then there is a much more realistic failure mode: the transformation server is down.

So in Part 12 we’re going to implement a poetry transformation server and then, in the next Part, we’ll update our poetry client to use the external transformation service and learn a few new things about Deferreds in the process.

Designing the Protocol

Up till now the interactions between client and server have been strictly one-way. The server sends a poem to the client while the client never sends anything at all to the server. But a transformation service is two-way — the client sends a poem to the server and then the server sends a transformed poem back. So we’ll need to use, or invent, a protocol to handle that interaction.

While we’re at it, let’s allow the server to support multiple kinds of transformations and allow the client to select which one to use. So the client will send two pieces of information: the name of the transformation and the complete text of the poem. And the server will return a single piece of information, namely the text of the transformed poem. So we’ve got a very simple sort of Remote Procedure Call.

Twisted includes support for several protocols we could use to solve this problem, including XML-RPC, Perspective Broker, and AMP.

But introducing any of these full-featured protocols would require us to go too far afield, so we’ll roll our own humble protocol instead. Let’s have the client send a string of the form (without the angle brackets):

<transform-name>.<text of the poem>

That’s just the name of the transform, followed by a period, followed by the complete text of the poem itself. And we’ll encode the whole thing in the form of a netstring. And the server will send back the text of the transformed poem, also in a netstring. Since netstrings use length-encoding, the client will be able to detect the case where the server fails to send back a complete result (maybe it crashed in the middle of the operation). If you recall, our original poetry protocol has trouble detecting aborted poetry deliveries.

So much for the protocol design. It’s not going to win any awards, but it’s good enough for our purposes.

The Code

Let’s look at the code of our transformation server, located in twisted-server-1/tranformedpoetry.py. First, we define a TransformService class:

class TransformService(object):

    def cummingsify(self, poem):
        return poem.lower()

The transform service currently implements one transformation, cummingsify, via a method of the same name. We could add additional algorithms by adding additional methods. Here’s something important to notice: the transformation service is entirely independent of the particular details of the protocol we settled on earlier. Separating the protocol logic from the service logic is a common pattern in Twisted programming. Doing so makes it easy to provide the same service via multiple protocols without duplicating code.

Now let’s look at the protocol factory (we’ll look at the protocol right after):

class TransformFactory(ServerFactory):

    protocol = TransformProtocol

    def __init__(self, service):
        self.service = service

    def transform(self, xform_name, poem):
        thunk = getattr(self, 'xform_%s' % (xform_name,), None)

        if thunk is None: # no such transform
            return None

        try:
            return thunk(poem)
        except:
            return None # transform failed

    def xform_cummingsify(self, poem):
        return self.service.cummingsify(poem)

This factory provides a transform method which a protocol instance can use to request a poetry transformation on behalf of a connected client. The method returns None if there is no such transformation or if the transformation fails. And like the TransformService, the protocol factory is independent of the wire-level protocol, the details of which are delegated to the protocol class itself.

One thing to notice is the way we guard access to the service though the xform_-prefixed methods. This is a pattern you will find in the Twisted sources, although the prefixes vary and they are usually on an object separate from the factory. It’s one way of preventing client code from executing an arbitrary method on the service object, since the client can send any transform name they want. It also provides a place to perform protocol-specific adaptation to the API provided by the service object.

Now we’ll take a look at the protocol implementation:

class TransformProtocol(NetstringReceiver):

    def stringReceived(self, request):
        if '.' not in request: # bad request
            self.transport.loseConnection()
            return

        xform_name, poem = request.split('.', 1)

        self.xformRequestReceived(xform_name, poem)

    def xformRequestReceived(self, xform_name, poem):
        new_poem = self.factory.transform(xform_name, poem)

        if new_poem is not None:
            self.sendString(new_poem)

        self.transport.loseConnection()

In the protocol implementation we take advantage of the fact that Twisted supports netstrings via the NetstringReceiver protocol. That base class takes care of decoding (and encoding) the netstrings and all we have to do is implement the stringReceived method. In other words, stringReceived is called with the content of a netstring sent by the client, without the extra bytes added by the netstring encoding. The base class also takes care of buffering the incoming bytes until we have enough to decode a complete string.

If everything goes ok (and if it doesn’t we just close the connection) we send the transformed poem back to the client using the sendString method provided by NetstringReceiver (and which ultimately calls transport.write()). And that’s all there is to it. We won’t bother listing the main function since it’s similar to the ones we’ve seen before.

Notice how we continue the Twisted pattern of translating the incoming byte stream to higher and higher levels of abstraction by defining the xformRequestReceived method, which is passed the name of the transform and the poem as two separate arguments.

A Simple Client

We’ll implement a Twisted client for the transformation service in the next Part. For now we’ll just make do with a simple script located in twisted-server-1/transform-test. It uses the netcat program to send a poem to the server and then prints out the response (which will be encoded as a netstring). Let’s say you run the transformation server on port 11000 like this:

python twisted-server-1/tranformedpoetry.py --port 11000

Then you could run the test script against that server like this:

./twisted-server-1/transform-test 11000

And you should see some output like this:

15:here is my poem,

That’s the netstring-encoded transformed poem (the original is in all upper case).

Discussion

We introduced a few new ideas in this Part:

  1. Two-way communication.
  2. Building on an existing protocol implementation provided by Twisted.
  3. Using a service object to separate functional logic from protocol logic.

The basic mechanics of two-way communication are simple. We used the same techniques for reading and writing data in previous clients and servers; the only difference is we used them both together. Of course, a more complex protocol will require more complex code to process the byte stream and format outgoing messages. And that’s a great reason to use an existing protocol implementation like we did today.

Once you start getting comfortable writing basic protocols, it’s a good idea to take a look at the different protocol implementations provided by Twisted. You might start by perusing the twisted.protocols.basic module and going from there. Writing simple protocols is a great way to familiarize yourself with the Twisted style of programming, but in a “real” program it’s probably a lot more common to use a ready-made implementation, assuming there is one available for the protocol you want to use.

The last new idea we introduced, the use of a Service object to separate functional and protocol logic, is a really important design pattern in Twisted programming. Although the service object we made today is trivial, you can imagine a more realistic network service could be quite complex. And by making the Service independent of protocol-level details, we can quickly provide the same service on a new protocol without duplicating code.

Figure 27 shows a transformation server that is providing poetry transformations via two different protocols (the version of the server we presented above only has one protocol):

Figure 27: a transformation server with two protocols

Figure 27: a transformation server with two protocols

Although we need two separate protocol factories in Figure 27, they might differ only in their protocol class attribute and would be otherwise identical. The factories would share the same Service object and only the Protocols themselves would require separate implementations. Now that’s code re-use!

Looking Ahead

So much for our transformation server. In Part 13, we’ll update our poetry client to use the transform server instead of implementing transformations in the client itself.

Suggested Exercises

  1. Read the source code for the NetstringReceiver class. What happens if the client sends a malformed netstring ? What happens if the client tries to send a huge netstring?
  2. Invent another transformation algorithm and add it to the transformation service and the protocol factory. Test it out by modifying the netcat client.
  3. Invent another protocol for requesting poetry transformations and modify the server to handle both protocols (on two different ports). Use the same instance of the TransformService for both.
  4. How would the code need to change if the methods on the TransformService were asynchronous (i.e., they returned Deferreds)?
  5. Write a synchronous client for the transformation server.
  6. Update the original client and server to use netstrings when sending poetry.

Book: Why Most Things Fail: Evolution, Extinction and Economics

Jan 28th, 2010 | Filed under Books

A tremendously stimulating investigation of failure in biological and economic systems.

Book: The Fifth Elephant

Jan 20th, 2010 | Filed under Books

Commander Vimes of the Watch is forced into diplomatic duty.

An Introduction to Asynchronous Programming and Twisted

Jan 17th, 2010 | Filed under Blather, Programming, Python, Software

Part 11: Your Poetry is Served

This continues the introduction started here. You can find an index to the entire series here.

A Twisted Poetry Server

Now that we’ve learned so much about writing clients with Twisted, let’s turn around and re-implement our poetry server with Twisted too. And thanks to the generality of Twisted’s abstractions, it turns out we’ve already learned almost everything we need to know. Take a look at our Twisted poetry server located in twisted-server-1/fastpoetry.py. It’s called fastpoetry because this server sends the poetry as fast as possible, without any delays at all. Note there’s significantly less code than in the client!

Let’s take the pieces of the server one at a time. First, the PoetryProtocol:

class PoetryProtocol(Protocol):

    def connectionMade(self):
        self.transport.write(self.factory.poem)
        self.transport.loseConnection()

Like the client, the server uses a Protocol instance to manage connections (in this case, connections that clients make to the server). Here the Protocol is implementing the server-side portion of our poetry protocol. Since our wire protocol is strictly one-way, the server’s Protocol instance only needs to be concerned with sending data. If you recall, our wire protocol requires the server to start sending the poem immediately after the connection is made, so we implement the connectionMade method, a callback that is invoked after a Protocol instance is connected to a Transport.

Our method tells the Transport to do two things: send the entire text of the poem (self.transport.write) and close the connection (self.transport.loseConnection). Of course, both of those operations are asynchronous. So the call to write() really means “eventually send all this data to the client” and the call to loseConnection() really means “close this connection once all the data I’ve asked you to write has been written”.

As you can see, the Protocol retrieves the text of the poem from the Factory, so let’s look at that next:

class PoetryFactory(ServerFactory):

    protocol = PoetryProtocol

    def __init__(self, poem):
        self.poem = poem

Now that’s pretty darn simple. Our factory’s only real job, besides making PoetryProtocol instances on demand, is storing the poem that each PoetryProtocol sends to a client.

Notice that we are sub-classing ServerFactory instead of ClientFactory. Since our server is passively listening for connections instead of actively making them, we don’t need the extra methods ClientFactory provides. How can we be sure of that? Because we are using the listenTCP reactor method and the documentation for that method explains that the factory argument should be an instance of ServerFactory.

Here’s the main function where we call listenTCP:

def main():
    options, poetry_file = parse_args()

    poem = open(poetry_file).read()

    factory = PoetryFactory(poem)

    from twisted.internet import reactor

    port = reactor.listenTCP(options.port or 0, factory,
                             interface=options.iface)

    print 'Serving %s on %s.' % (poetry_file, port.getHost())

    reactor.run()

It basically does three things:

  1. Read the text of the poem we are going to serve.
  2. Create a PoetryFactory with that poem.
  3. Use listenTCP to tell Twisted to listen for connections on a port, and use our factory to make the protocol instances for each new connection.

After that, the only thing left to do is tell the reactor to start running the loop. You can use any of our previous poetry clients (or just netcat) to test out the server.

Discussion

Recall Figure 8 and Figure 9 from Part 5. Those figures illustrated how a new Protocol instance is created and initialized after Twisted makes a new connection on our behalf. It turns out the same mechanism is used when Twisted accepts a new incoming connection on a port we are listening on. That’s why both connectTCP and listenTCP require factory arguments.

One thing we didn’t show in Figure 9 is that the connectionMade callback is also called as part of Protocol initialization. This happens no matter what, but we didn’t need to use it in the client code. And the Protocol methods that we did use in the client aren’t used in the server’s implementation. So if we wanted to, we could make a shared library with a single PoetryProtocol that works for both clients and servers. That’s actually the way things are typically done in Twisted itself. For example, the NetstringReceiver Protocol can both read and write netstrings from and to a Transport.

We skipped writing a low-level version of our server, but let’s think about what sort of things are going on under the hood. First, calling listenTCP tells Twisted to create a listening socket and add it to the event loop. An “event” on a listening socket doesn’t mean there is data to read; instead it means there is a client waiting to connect to us.

Twisted will automatically accept incoming connection requests, thus creating a new client socket that links the server directly to an individual client. That client socket is also added to the event loop, and Twisted creates a new Transport and (via the PoetryFactory) a new PoetryProtocol instance to service that specific client. So the Protocol instances are always connected to client sockets, never to the listening socket.

We can visualize all of this in Figure 26:

Figure 26: the poetry server in action

In the figure there are three clients currently connected to the poetry server. Each Transport represents a single client socket, and the listening socket makes a total of four file descriptors for the select loop to monitor. When a client is disconnected the associated Transport and PoetryProtocol will be dereferenced and garbage-collected (assuming we haven’t stashed a reference to one of them somewhere, a practice we should avoid to prevent memory leaks). The PoetryFactory, meanwhile, will stick around as long as we keep listening for new connections which, in our poetry server, is forever. Like the beauty of poetry. Or something. At any rate, Figure 26 certainly cuts a fine figure of a Figure, doesn’t it?

The client sockets and their associated Python objects won’t live very long if the poem we are serving is relatively short. But with a large poem and a really busy poetry server we could end up with hundreds or thousands of simultaneous clients. And that’s OK — Twisted has no built-in limits on the number of connections it can handle. Of course, as you increase the load on any server, at some point you will find it cannot keep up or some internal OS limit is reached. For highly-loaded servers, careful measurement and testing is the order of the day.

Twisted also imposes no limit on the number of ports we can listen on. In fact, a single Twisted process could listen on dozens of ports and provide a different service on each one (by using a different factory class for each listenTCP call). And with careful design, whether you provide multiple services with a single Twisted process or several is a decision you could potentially even postpone to the deployment phase.

There’s a couple things our server is missing. First of all, it doesn’t generate any logs that might help us debug problems or analyze our network traffic. Furthermore, the server doesn’t run as a daemon, making it vulnerable to death by accidental Ctrl-C (or just logging out). We’ll fix both those problems in a future Part but first, in Part 12, we’ll write another server to perform poetry transformation.

Suggested Exercises

  1. Write an asynchronous poetry server without using Twisted, like we did for the client in Part 2. Note that listening sockets need to be monitored for reading and a “readable” listening socket means we can accept a new client socket.
  2. Write a low-level asynchronous poetry server using Twisted, but without using listenTCP or protocols, transports, and factories, like we did for the client in Part 4. So you’ll still be making your own sockets, but you can use the Twisted reactor instead of your own select loop.
  3. Make the high-level version of the Twisted poetry server a “slow server” by using callLater or LoopingCall to make multiple calls to transport.write(). Add the --num-bytes and --delay command line options supported by the blocking server. Don’t forget to handle the case where the client disconnects before receiving the whole poem.
  4. Extend the high-level Twisted server so it can serve multiple poems (on different ports).
  5. What are some reasons to serve multiple services from the same Twisted process? What are some reasons not to?

Book: Real World Haskell

Jan 16th, 2010 | Filed under Books

A thorough, and thoroughly excellent, introduction to the programming language Haskell.