PyMW: Summer-y of Code

Today is the “suggested pencils down” date for Summer of Code and I’m very happy to say that my proposal is complete and I feel the summer was a great success!

For those of you who don’t know, PyMW is a Master-Worker computing framework in Python. It wraps several other Master-Worker frameworks such as MPI, Condor, BOINC or even just using multi-core processors, and exposes them as a simple and elegant API.

The way it works is, you create tasks and submit them to a Master and the master uses an interface/wrapper (BOINC, Condor, MPI, etc) to process those tasks. The master distributes the tasks out to compute nodes (or processor cores), called Workers, and the results get sent back to the Master. This allows you to debug using the multi-core interface (a single machine) and then, by changing one switch on the command line, you can run the same code on thousands of machines using BOINC or any other supported interface.

My proposal was to improve BOINC integration with PyMW by 1) eliminating the startup script and the need to compile C code; 2) adding pure-Python support for BOINC Assimilators and Validators; and 3) by adding a new checkpointing mechanism for long running jobs (optional).

I completed (2) very quickly by virtue of some old Python code I found in BOINC, but (1) took a lot of sweat and tears — the existing BOINC interface was functional, but in need of some serious work. It was running under Mac and Linux, however the Windows client was crashing on every task. To get it working under Windows I ended up creating a C++ launcher application to avoid using batch scripts.

In addition to my proposal, I also added a few other tasks. When working with BOINC, the existing interface assumed that Python was already installed and on the system PATH environment variable. This is not a very safe assumption and so I also crated a portable Python interpreter integrated with the BOINC API. PyMW will now install this interpreter as your BOINC application so that clients no longer need to have Python installed to run PyMW-BOINC compute jobs. Along the way I also created a new logo and graphic design for the PyMW web site and setup WordPress, check it out.

Sadly goal (3) was not completed, however, it was originally proposed as an optional part of the project. I actively chose to pursue the BOINC-Python interpreter over (3) because I felt it was more important to the BOINC interface. In the end, checkpointing can always be done manually, but sending the Python interpreter is a considerably harder task.

I am still wrapping up a few odds and ends, but for the most part, I feel very happy with the state of the BOINC interface. If you are looking for a distributed or parallel processing framework for a future project, please consider PyMW.

If you have any questions or comments, I would love to hear them!

Tagged Tags: , , on August 10, 2009 at 12:45 pm

PyBOINC Work Continues

I am still working on PyBOINC (the embedded Python interpreter with support for BOINC). The actual integration (exposing the BOINC API to Python) was easy, it’s the cross-platform build that’s most difficult.

The problem is that when running an application on BOINC, you have no guarantee of what libraries will be available, so you must either distribute the libraries you need or compile them statically. I chose the later, which also caused lots of issues with the Python standard library. It’s mostly working now, with the exception of the sqlite module.

I moved the repo over to my bit bucket account since Nicolas is busy with other projects. The latest code can be found here.

Tagged Tags: , , , on July 31, 2009 at 4:11 pm

Integrating Python & BOINC

I started working on an embedded Python interpreter for BOINC with Nicolás Alvarez. The interpreter will be the main executable for Python based workunits and provides interop with the BOINC client API. It allows, for example, reporting percentage complete per workunit, which isn’t possible currently with PyMW alone.

The project will likely be incorporated into the BOINC trunk, but the current code is available on bitbucket as PyBOINC if you are interested in seeing how it works.

The Python developers made embedding the interpreter incredibly easy from C/C++ and the BOINC API is readily available from C/C++ as well, so there really isn’t much code. However, packaging the Python standard library and compiling it so it runs on multiple platforms are still going to be challenging.

Tagged Tags: , , , on July 12, 2009 at 10:29 pm

Pacman Server Running!

I got the Pacman server running today! It’s processing matches in a round-robin style, not the double elimination tournament yet, but it’s running :)

This has again exposed the BOINC file immutability issue and I can see now that there is no hacking my way around it — I am going to have to change the interface so that all executables and WorkUnits are uniquely named.

Tagged Tags: , , , on July 6, 2009 at 10:14 pm

BOINC: Bundling Files

I’ve added support into the BOINC interface for bundled data files, however adding this new featue has exposed a new issue in the BOINC interface. I’ve known previously that BOINC holds an odd assumption of immutable files — any file ever seen by BOINC is expected to *never* change it’s contents for all time — however when running a PyMW application, the executable (for example, “monte_pi”) is reused over and over desipite any changes that may have occurred in the code.

This hasn’t been an issue up until now, mainly because I was running the PyMW example applications and not modifying them between executions. However, with the introduction of PyMW data bundles, this problem has become painfully obvious. Since data bundles are given a temporary file name and this file name is dynamically embedded into the body of the executable, the executable is now changing its contents on every run.

The fact that the file is changing and the file name remains the same means that BOINC keeps only one copy of the file (because of file name immutability/versioning). The end result is that when a work unit executes on a worker machine, it tries to open the first data bundle file name that was ever created because that first file was cached and never updated.

To fix this, I’ve added some code into the BOINC interface that deletes all work unit related files from the BOINC “download” directory on every execution. This has fixed the problem for now, but the interface should rename all files to a unique name before execution. This is one of my goals for the next iteration of the BOINC interface.

Tagged Tags: , , on July 4, 2009 at 10:19 am

BOINC: Failed WorkUnits

When work units fail in BOINC, it poses a question of how to handle the remaining work units still being processed. I’ve added code to minimally handle failures so that manual user intervention isn’t required, however there is still a burden on the developer to understand this situation and decide how to recover from it.

This is really no different from handling exceptions in non-distributed code, but I know all to well how exceptions are normally handled (hint: they aren’t). If you are interested, you can read my full post on the PyMW blog.

Tagged Tags: , , on July 2, 2009 at 10:34 am

PyMW Site Finished, for Now

I’ve finished converting the PyMW website over to WordPress and implemented the new design and logo. It still needs more work, but I’m going to switch back into core BOINC interface mode again for a while.

The interface is working well now, but BOINC work unit failures still require manual user intervention. I would like to automate as much failure recovery as possible, so I will be focusing on this for the next few days.

Tagged Tags: , on June 29, 2009 at 4:21 pm

PyMW Distributed Pacman Server

Pacman CTF
To create a real application for PyMW, I think I am going to create a distributed Pacman server.

Last semester I took an artificial intelligence class that used Pacman as a teaching tool. At the end of the semester, there was a tournament where each team could pit their Pacman AI client against each other in a game of Pacman-style capture the flag. We submitted our clients to a server and then waited 24 hours or so for the results to appear. If your client crashed, you had to wait another 24 hours to see your standings.

My idea is this:

  • Create a PyMW application that runs Pacman tournaments
  • Each job will be 3 matches between two clients
  • An animated GIF will be created for one of the 3 matches (one that agrees with the outcome)
  • The BOINC interface will be used so students can contribute compute time
  • The output of the PyMW application will be records in a MySQL database
  • Create a website for statistics

To test the tournament server, I am going to get the AI client code from last semester’s teams and then run it on the BOINC Alpha group. This should provide a solid test of the PyMW BOINC interface and my tournament server.

Tagged Tags: , , , , on June 29, 2009 at 12:17 pm

PyMW: Logo and Layout

PyMW: Python + BOINC

Created a new logo and web layout/design for PyMW. This isn’t really part of the summer of code gig, but I was feeling inspired so I threw this together. It’s still a work in progress, but you can get a feel for the design.

I’m not much of a graphic artist when it come to identity, but I tried to represent idea of one framework joining many disparate models of computation as well as the general idea of master-worker computing.

Tagged Tags: , , on June 10, 2009 at 12:08 am

PyMW: Week one

Today is the official end of my 7th day of working on the PyMW interface for BOINC for Google Summer of Code.

It took me three days to get the my first PyMW app to run (monte_pi.py), which ran on 4 virtual nodes (4 tasks). More than four tasks was causing problems, it turned out that there was a bug in the PyMW BOINC interface. Now that that’s fixed, I ran with 200 nodes yesterday and 800 today.

This morning, I tried running with 2 physical nodes: my laptop and my Ubuntu VM, which failed at the end of computation. Somehow the canonical results are not being recognized which causes the BOINC interface to get lost in limbo and hang forever.

This week, I created a pure-Python assimilator for PyMW, which works pretty well, but is perhaps causing the error above.

I also rewrote a big swath of the BOINC interface to stop it from using a new thread for each task during task reclamation (getting data back from BOINC). Since it was using one thread per task, it was reaching the maximum number of scheduler threads. This in turn caused the execution thread to hang until some of the tasks completed. Now it reclaims tasks in a single thread and is able to queue all tasks in a single shot, greatly improving the throughput from PyMW -> BOINC.

Overall, it’s been really fun so far. The first few days were trying, since there was little documentation of how to get PyMW to play nice with BOINC. But seeing the first application run was great :)

Tagged Tags: , , , , on June 5, 2009 at 12:01 pm

Next Page »