I’ve just finished my first working draft of a python API to describe and execute an EC2 deployment with a controlled lifecycle. This is part of the larger Desktop-to-Cloud (D2C) project, aimed at assisting researchers with deploying scientific, job-based applications to the cloud.
[sourcecode language="python" wraplines="true"]
deployment = Deployment(
startActions=[Action(command="echo howdy > /tmp/howdy.txt",
destination=testDir + "howdy.txt",
def notify(self, event):
print "Deployment state changed to: " + event.newState
What does this code do?
It launches on EC2 instance using AMI ami-47cefa33. After the remote instance has been detected as running, the script “echo howdy > /tmp/howdy.txt” is executed on it. Next, the instance is polled for the presence of “/tmp/howdy.txt” (which will of course be present immediately). Once “howdy.txt” is spotted, the program copies the remote file to the local host. Finally, the instance is terminated and the programs completes execution.
All lifecycle changes are communicated to the console through the Listener.
How is this accomplished?
The top level object Deployment contains all definition and logic need to:
- instantiate EC2 hosts
- execute programs on the hosts
- copy any output data to the local host
- shutdown the instances when the program(s) complete
A Deployment is composed of one or more Roles. A Role is defined by a single AMI and a count. Each Role must additionally specify at least one Checker object, declared in the finishedChecks constructor parameter.
When a Deployment is executed via its run method, it goes through the following lifecycle
- Launch Instances EC2 instances are provisioned, as specified by the Role AMIs and count.
- Start Roles Optional start scripts, declared in the Role startActions constructor are executed.
- Completion MonitoringAll instances are polled for process completion, using the associated finishedChecks.
- Data Collection Any data collectors declared for a role are executed, fetching remote data files.
- Termination All instances are terminated.
The Deployment class also supports Listeners, which can register to receive state change events.
What if the client process dies during the deployment?
The current implementation supports re-attaching to a “live” deployment, assuming the deployment’s state is persisted. For example, if the Deployment’s run method is called, and the initial state is “ROLES_STARTED”, the code lifecycle continues with “finished monitoring.” Persistence of the deployment is not handled by the object itself, but can trivially be added with a Listener that in turn handles storage (the route I plan to take).
After a quick Google scan, I have not found any definitive document on AWS API rate limiting, though I have seen multiple mentions of 1 request per second per client IP. Currently, the poll rate default is 4 req / minute.
A missing step in the current implementation is collection of instance performance metrics (IO, CPU, memory, etc.) This information (sans memory) will be gathered from the CloudWatch API. Memory monitoring, because of how VM metrics are collected, must be collected via another mechanism (TBD)