About the Post

Author Information

Chris is a Software Engineering Master's student at Tartu University, with a Enterprise specialization. He has industry experience developing and operating massively scaled web applications.

EC2 Deployment and Lifecycle API

I’ve just finished my first working draft of a python API to describe and execute an EC2 deployment with a controlled lifecycle. This is part of the larger Desktop-to-Cloud (D2C) project, aimed at assisting researchers with deploying scientific, job-based applications to the cloud.

An Example

[sourcecode language="python" wraplines="true"]
deployment = Deployment(
name="testDeployment",
ec2ConnFactory=ec2ConnFactory,
roles=[
Role(
name="testRole",
ami=AMI(amiId="ami-47cefa33"),
count=1,
credStore=credStore,
startActions=[Action(command="echo howdy > /tmp/howdy.txt",
credStore=credStore)],
finishedChecks=[FileExistsFinishedCheck(fileName="/tmp/howdy.txt",
credStore=credStore)],
dataCollectors=[DataCollector(source="/tmp/howdy.txt",
destination=testDir + "howdy.txt",
credStore=credStore)])])

class Listener:
def notify(self, event):
print "Deployment state changed to: " + event.newState

deployment.addAnyStateChangeListener(Listener())

deployment.run()
[/sourcecode]

What does this code do?

It launches on EC2 instance using AMI ami-47cefa33. After the remote instance has been detected as running, the script “echo howdy > /tmp/howdy.txt” is executed on it. Next, the instance is polled for the presence of “/tmp/howdy.txt” (which will of course be present immediately). Once “howdy.txt” is spotted, the program copies the remote file to the local host. Finally, the instance is terminated and the programs completes execution.

All lifecycle changes are communicated to the console through the Listener.

How is this accomplished?

The top level object Deployment contains all definition and logic need to:

  • instantiate EC2 hosts
  • execute programs on the hosts
  • copy any output data to the local host
  • shutdown the instances when the program(s) complete

A Deployment is composed of one or more Roles. A Role is defined by a single AMI and a count. Each Role must additionally specify at least one Checker object, declared in the finishedChecks constructor parameter.

When a Deployment is executed via its run method, it goes through the following lifecycle

  • Launch Instances EC2 instances are provisioned, as specified by the Role AMIs and count.
  • Start Roles Optional start scripts, declared in the Role startActions constructor are executed.
  • Completion MonitoringAll instances are polled for process completion, using the associated finishedChecks.
  • Data Collection Any data collectors declared for a role are executed, fetching remote data files.
  • Termination All instances are terminated.

Under the hood, this makes use of boto, a python wrapper around Amazon’s AWS API.

The Deployment class also supports Listeners, which can register to receive state change events.

Considerations

What if the client process dies during the deployment?

The current implementation supports re-attaching to a “live” deployment, assuming the deployment’s state is persisted. For example, if the Deployment’s run method is called, and the initial state is “ROLES_STARTED”, the code lifecycle continues with “finished monitoring.” Persistence of the deployment is not handled by the object itself, but can trivially be added with a Listener that in turn handles storage (the route I plan to take).

Polling Rate

After a quick Google scan, I have not found any definitive document on AWS API rate limiting, though I have seen multiple mentions of 1 request per second per client IP. Currently, the poll rate default is 4 req / minute.

Performance Metrics

A missing step in the current implementation is collection of instance performance metrics (IO, CPU, memory, etc.) This information (sans memory) will be gathered from the CloudWatch API. Memory monitoring, because of how VM metrics are collected, must be collected via another mechanism (TBD)


Tags: , , , ,

8 Responses to “EC2 Deployment and Lifecycle API”

  1. Ilja #

    Hi, Chris

    a comment on the polling part – it might make more sense to have a push mechanism there. AWS recently announced SNS service (http://aws.amazon.com/sns/) which could be used for pushing notification about the state changes.

    April 14, 2011 at 11:53 Reply
    • SNS would make sense if the lifecycle management was being controlled by a long-lived daemon. However, in this current dev phase, I’m creating a desktop controller, so really no guaranteed IP for SNS to send the events to. Down the line, if the deployment management becomes a centralized service, SNS fits quite well.

      April 14, 2011 at 12:57 Reply
      • Ilja #

        What about SQS then? Push model notifications I mean.

        April 15, 2011 at 23:37
      • >>What about SQS then? Push model notifications I mean.

        I don’t think this is a replacement for the polling “problem” either:

        1. SQS get_message api calls will return 0 messages if there are none on the queues (there isn’t a persistent connection from producer to consumer). So you end up polling the queue. I believe this is an issue with SNS as well.

        2. There is no built-in AWS mechanism to push instance state change alerts. CloudWatch only provides metric alarms (although perhaps these could be hacked). So you would have to create a poller to inject the push notifications.

        April 20, 2011 at 09:33
  2. Ilja #

    I guess you’ll end up implementing webhooks?
    http://wiki.webhooks.org/w/page/13385124/FrontPage

    April 25, 2011 at 11:48 Reply
    • I did miss the fact that you can use SNS to push events to an HTTP listener (i.e. webhook). Utilizing this would require the client to have a registered listener on a public IP. This would probably be a problem for most desktop clients as they would be behind a NAT. An additional aspect of SNS to account for is that there is no guaranteed delivery if the HTTP listener is momentarily unavailable [http://aws.amazon.com/sns/faqs/#48]. For that reason, any SNS events should be broadcast to a durable SQS queue as well. Any listeners that are temporarily down can check the queue for possible missed messages, cross-referencing with ones already processed.
      Still missing though from the whole solution is a way to get lifecycle events somehow broadcasted by Amazon. Any ideas for that?

      April 25, 2011 at 13:07 Reply
  3. Hello there, simply become aware of your weblog through Google, and located that it is really informative. I’m going to be careful for brussels. I will appreciate should you continue this in future. Lots of other folks will probably be benefited out of your writing. Cheers!

    December 2, 2011 at 07:25 Reply

Trackbacks/Pingbacks

  1. Research group digest – ulno.net - April 14, 2011

    [...] EC2 Deployment and Lifecycle API I’ve just finished my first working draft of a python API to describe and execute an EC2 deployment with a controlled lifecycle. This is part of the larger Desktop-to-Cloud (D2C) project, aimed at assisting researchers with deploying scientific, job-based applications to the cloud. An Example deployment = Deployment( name="testDeployment", ec2ConnFactory=ec2ConnFactory, roles=[ Role( name="testRole", ami=AMI(amiId="ami-47cefa33"), count=1, [...] [...]

Leave a Reply