April 04, 2007

Non-blocking I/O With PHP-MIO

A couple of weeks ago I was thinking about non-blocking I/O in PHP, specifically about how clunky PHP's select implementation is. I say clunky because it's not bad, it's just not as easy to use as it could be. It's not as easy, for example, as the implementation found in Java's NIO package which is beautifully simple to use. The main issue I have with PHP's implementation is that I am responsible for keeping track of everything. I have to remember which streams I'm interested in writing to, which streams I'm interested in reading from and when I get to accepting connections, which streams are server sockets that I'm interested in accepting connections on. I'm lazy, I don't want to have to do that, I want a library to handle all that for me. At this point I decided to implement something similar to Java's non-blocking I/O in PHP5. This is now finished and up on sourceforce (under the name of phpmio). In this article I hope to give you enough information to get up and running with the package.


But What Is Multiplexed I/O?

Before I go any further I suppose I should explain exactly what multiplexed (or non-blocking) I/O actually is. When reading from or writing to a stream PHP usually blocks until the operation is complete, however, a stream's blocking mode can be set such that operations on streams don't block and instead return immediately. Used correctly this technique can vastly improve performance in networked applications. This comes at the price of increased complexity and some would argue a more confusing program flow. For this reason I wouldn't suggest it for trivial applications. Let's take a look at this in action. In the example below we open a stream to amazon and try to read some data from it, then display how long each operation took and how much data was read. If this is run with the stream's blocking mode set to 0 (non-blocking) you will notice that the read takes very little time and not all of the bytes are read. If, on the other hand, the stream's blocking mode is set to 1 (blocking) you will notice that the read takes much longer and all 2048 bytes are read.


<?php
$start = microtime( true );

$fp = fopen( 'http://www.amazon.co.uk/', 'r' );
$open = microtime( true );

stream_set_blocking( $fp, 1 );
$block = microtime( true );

$data = fread( $fp, 2048 );
$read = microtime( true );

// the time taken to open the url
echo "Open: " . ($open-$start) . "\n";
// the time taken to set the stream to blocking
echo "Block: " . ($block-$open) . "\n";
// the time taken to read from the the stream
echo "Read: " . ($read-$block) . "\n";
// the amount of data read
echo "Data Read: " . strlen( $data ) . "\n";

OK, So What Is PHP MIO?

So, how does multiplexed I/O work with the PHP MIO package? Within the package there are three key classes; MioStream, MioSelector and MioSelectionKey. There is also a factory class to provide a convenient way of creating different types of streams, MioStreamFactory, and a few Exception classes. Before diving into the core of the package let us take a quick look at MioStreamFactory to get it out of the way. Below are three examples of how it can be used. Each method creates an instance of the MioStream class which wraps a PHP stream. The first method, createSocketStream, creates (as you might expect) a socket stream (as would be created with fsockopen); the second a server socket (stream_socket_server); and the third a file socket (fopen). One thing which should be noted is that creating an MioStream with MioStreamFactory implicitly sets it's blocking flag to 0.


<?php
$factory = new MioStreamFactory();

// Create a client socket stream
$socket = $factory->createSocketStream( '127.0.0.1', 8888 );

// Create a server socket stream
$server = $factory->createServerStream( '127.0.0.1:8888' );

// Create a local file stream
$file = $factory->createFileStream( '/etc/hosts' );

An MioStream object allows us to do the basic things we would want to do with a stream such as reading and writing. Although MioStream only handles basic functions itself, it does give you access to it's underlying stream resource in case you need to do anything more advanced. In the code below we write some data to the socket stream we made in the last example and then read some data from it. We then attempt to accept a new stream on the server socket stream (note that the accept method will return an MioStream object). On this new stream we check that it's open, close it and then check that it's no longer open. Finally, we get hold of the socket stream's internal resource and append a stream filter to it.


<?php
$socket->write( 'Put some data' );

$socket->read( 1024 );

$stream = $server->accept();

if( !$stream->isOpen() ) {
trigger_error( "The new stream should be open", E_USER_ERROR );
}

$stream->close();

if( $stream->isOpen() ) {
trigger_error( "The new stream should now be closed", E_USER_ERROR );
}

stream_filter_append( $socket->getStream(), 'string.toupper' );

Downloading The AMP Stack

Now that we know how to create MioStream objects and interact with them let's take a look at how they are used with the MioSelector. A selector is an object used for managing and selecting streams which are available for different types of work (reading, writing or accepting connections). This is done by registering MioStream objects with the selector, the relationship between the selector and each stream is excapsulated in an MioSelectionKey object. When we register a stream with a selector we also provide what we're interested in for this stream (this can be reading, writing or accepting connections) and optionally an object which we want associated with the stream (so we know what to do with it). Once we have registered our streams with the selector we can call the select method to get all registered streams which are ready for any of the operations we are interested in. To get an idea of how this works let's take a look at a simple example, downloading three files.

In this example we open a stream to each remote file we want to download and one for each local file we want to write to. We the register each remote stream with the selector, attaching the stream's respective local stream for writing to. Once we have registered them we loop over the selector's select method, the select method returns the number of ready streams (streams which area available for one of the actions we have registered an interest in) or false if there are no streams registered with the selector. An important note to make here is that streams are automatically unregistered from the selector when they are closed so in this case we don't have to explicitly unregister them. Now we can loop over all the streams which have been selected and perform our action on them. In this case we read from the remote stream and write the data to it's associated local stream.


<?php
$selector = new MioSelector();
$factory = new MioStreamFactory();

// Create and register streams to download the PHP 5.2.1 source
$reader = $factory->createFileStream( 'http://uk.php.net/get/php-5.2.1.tar.bz2/from/this/mirror', 'r' );
$writer = $factory->createFileStream( 'php-5.2.1.tar.bz2', 'w+' );
$selector->register( $reader, MioSelectionKey::OP_READ, $writer );

// Create and register streams to download the MySQL 5.11.15 binary
$reader = $factory->createFileStream( 'http://dev.mysql.com/get/Downloads/MySQL-5.1/mysql-5.11.15-beta-linux-i686-glibc23.tar.gz/from/http://mirrors.dedipower.com/www.mysql.com/', 'r' );
$writer = $factory->createFileStream( 'mysql-5.11.15-beta-linux-i686-glibc23.tar.gz' );
$selector->register( $reader, MioSelectionKey::OP_READ, $writer );

// Create and register streams to download the Apache 2.2.4 source
$reader = $factory->createFileStream( 'http://www.mirrorservice.org/sites/ftp.apache.org/httpd/httpd-2.2.4.tar.bz2' );
$writer = $factory->createFileStream( 'httpd-2.2.4.tar.bz2' );
$selector->register( $reader, MioSelectionKey::OP_READ, $writer );

while( true ) {
// Loop over select untill we have some streams to act on
while( !$count = $selector->select() ) {
if( $count === false ) {
$selector->close();
break 2;
}
}

// Loop over all streams which are available for
// something we're interested in
foreach( $selector->selected_keys as $key ) {
if( $key->isReadable() ) {
$key->attachment->write(
$key->stream->read( 16384 )
);
}
}
}

Serving Up Echoes

I think we have a good understanding of how PHP MIO works now so let's take a look at a
server example. To keep it simple I'm going to do an echo server. This example will accept
connections on port 7, read data in and then send it straight back. First off, we're going
to need a class to encapsulate the echoing.


<?php
/**
* A class to echo data.
* This is eessentially just a FIFO queue. Data can be
* added onto the end of the buffer and at a later date
* it can be read (and implicitly removed) from the
* beginning of the buffer.
*/

class Echoer
{
/**
* Holds the data untill it needs to
* be echoed back
*/

private $buffer='';

/**
* Add some data to the buffer
*
*
@param string $data
*
@return void
*/

public function put( $data )
{
$this->buffer .= $data;
}

/**
* Read and remove a chunk of data from
* the start of the buffer
*
*
@param int $size The amount of data to read
*
@return string
*/

public function get( $size = 4096 )
{
$data = substr( $this->buffer, 0, $size );
$this->buffer = substr( $this->buffer, $size );
return $data;
}
}

Now we need to set up our server and get it working, what we're going to do is accept connections and then register these with the selector with an interest in reading. These streams will then appear in later selects where we can read a chunk of data off, put it in the echoer and reset the selection key's interest to writing so that we can echo the data back down the line.


<?php
// Create our base objects
$selector = new MioSelector();
$factory = new MioStreamFactory();

// Register a server stream with the selector
$selector->register(
// the server stream is listening on 127.0.0.1 port 7
$factory->createServerStream( '127.0.0.1:7' ),
// we are interested in accepting connections
MioSelectionKey::OP_ACCEPT
);

// loop for ever, this is going to be server
while( true ) {
// keep selecting until there's something to do
while( !$count = $selector->select() ) { }

// when there's something to do loop over the ready set
foreach( $selector->selected_keys as $key ) {
// do different acctions for different ready ops
if( $key->isAcceptable() ) {
// if the stream has connections ready to
// accept then accept them until there's no more
while( $stream = $key->stream->accept() ) {
// register the newly accepted connection with the
// selector so that it is handled in subsequent operations
$selector->register(
$stream,
// we are interested in reading from the stream
MioSelectionKey::OP_READ,
// attach an instance of the echoer to manage echoing
new Echoer()
);
}
} elseif( $key->isReadable() ) {
// if the stream is ready for reading then
// read a chunk of data off it and add it to
// the echoer
$key->attachment->put(
$key->stream->read( 4096 )
);
// now we're interested in writing back down the pipe
$key->setInterestOps( MioSelectionKey::OP_WRITE );
} elseif( $key->isWritable() ) {
// if the stream is ready for writing then
// get some data from the echoer
$data = $key->attachment->get();
if( $data ) {
// if there's data there then send it back
$key->stream->write(
$data
);
} else {
// if there's none then remove the key
$selector->removeKey( $key );
}
}
}
}

So, now we've done a multiplexed downloader and a multiplexed server. We have processed PHP sockets in a high performance and very efficient manner. PHP may not be the first choice for writing high performance networking applications but for knocking up, in a matter of minutes, something which performes pretty damned well, I think this could do the trick.

March 07, 2007

XMI 2 SQL in No Time

I've just discovered how easy XMI (as used by Umbrello) is to parse. I spent most of yesterday putting together an entity relationship diagram of the DB structure for integrating a data warehouse with our catalogue. Today, faced with the prospect of having to hand craft about 30 tables I decided to take a look at the XMI format. It's really simple. I managed to knock up a quick parser to build SQL from the XMI file and hey presto! All my tables are built.


I don't have any private hosting at the moment but I'll get it up when I do.

February 10, 2007

Discovering Kontact

This morning I made an amazing discovery, Kontact. I don't really know how I managed to miss this for so long, it's even installed by default my version of Kubuntu so I really have no excuse. It is basically a hub for a number of standard KDE applications; KMail(mail), KAddressBook(addresses), KOrganizer(calendar,todo and journal), KNotes(notes) and AKregator(feeds) some of which I have been using for quite a while but some of which are completely new to me, namely KOrganizer. I came across Kontact while I was looking for a calendar program to put in some upcoming foody events around the South of England. Now, I've used a few calendar programs before but I've always become annoyed with them for one reason or another, usually related to how difficult they become to integrate with everything else, I don't think that's going to be a problem any more. Just one example of how nicely it all fits together is how birthdays from my address book can be automatically inserted into my calendar and kept up to date, brilliant!

February 08, 2007

Signing Up To Technorati

It's 8am and I can't be bothered to think up anything interesting to say so I'll just post the technorati blog claiming link.Technorati Profile

February 07, 2007

Selectable Streams In PHP

I've been toying with the idea of implementing selectable streams (like in Java's NIO package) in PHP recently. I have had a good look around and there doesn't seem to be a simple, flexible non-blocking IO implementation out there. I have come across one very nice socket daemon library, which lets you build a server in minutes which is very cool. However, it does only solve building servers and forces the developer to build their server a particular way. Hopefully selectable streams will be a useful solution.

January 25, 2007

A Night Out On The Town

Last night I went to the theatre for the first time in far too long. Thank you, Alex, for suggesting and organising the idea, it was a great night out. We went to see Love Song at the New Ambassadors Theatre just off Charing Cross Road and it was brilliant. I'll let those more qualified give you the breakdown and just say that I had a thoroughly enjoyable time.

January 22, 2007

Accessing Sharedance With PHPDance

Well, I've come a little late to this blogging thing, partly because I haven't had a huge amount to say which I think would be of much use to others. I still may not, but let's see.

This post, apart from starting with a short introduction, is about PHPDance, a PHP interface to the Sharedance cache server which I wrote a couple of months ago. Sharedance is a distributed object cache much like Danga's Memcached, except that while Memcached only saves data to memory, Sharedance also writes it to the hard disk.


Just what is a distributed object cache?

Well, an object cache is a tool which allows you to store arbitrary data, referenced by a key and then retrieve it by that key at a later date. The distributed bit refers to how and where that data is stored. In a local object cache such as APC the data is cached directly on the local machine, whereas a distributed cache spreads it's data across multiple machines. There are two main reasons for wanting to do this.


  1. Performance and making efficient use of the available resources. Imagine you have four webservers, each with 4Gb of memory (2Gb of which is reserved for the cache), and that you are using an object cache. Unless you have divided your application across the four servers (ie users.example.com, shop.example.com, review.example.com, payment.example.com) which is usually not possible, all four of your servers will be serving up roughly the same data. The effect on the cache is that pretty much the same data is being cached across each of the four servers giving you effectively 2Gb of memory for your application. However, if you could spread the cache across the four servers you could increase the amount of available cache space by four times making 8Gb of cache available. This side of things is covered very well by Memcached, with it's incredible speed.

  2. Avoiding the database all together, which requires resilience. Most of the time you'll be caching data which can be found in another source such as the database. However, there are situations where you don't want to have to store data in the database. When there are likely to be a large number of writes involved and there won't be much benefit gained. Session data is a very good example of this. It would be dangerous to store session data in memcached only as if the cache gets full old session data will fall off the cache. In this situation something like Sharedance would be more appropriate as data is also saved to the hard disk so old session data is not lost (unless it is given an expiry or is explicitly deleted).

Enter PHPDance

PHPDance provides a clean, object oriented, PHP5 interface to Sharedance. Let's take a look at an example for a setup similar to that mentioned earlier, four webservers each running an instance of the cache server (in this case Sharedance).


require 'sharedance.class.php';
$cache = new Sharedance();
$cache->addServer( new SharedanceServer( 'web1.example.com' ) );
$cache->addServer( new SharedanceServer( 'web2.example.com' ) );
$cache->addServer( new SharedanceServer( 'web3.example.com' ) );
$cache->addServer( new SharedanceServer( 'web4.example.com' ) );

$key = 'mykey';
$data_in = 'some data which needs to be cached';
$cache->set( $key, $data_in );

$data_out = $cache->get( $key );

Here we create a new instance of the Sharedance object which acts as our gateway to the cache as a whole and then add instances of SharedanceServer for each of the machines which we want involved in this cache. Note, this must be exactly the same everywhere the cache is being used. Then we set some data to the cache and then get it back again with the set and get methods. What happens under the hood is the Sharedance object determines which server the data should be cached on and then caches it there (this is why even the order of the servers must be the same everywhere this is used).


Building In Redundancy

This is all well and good, and the fact that the cache is written to the disk as well is great because it means we don't have to worry about overflowing the cache however it's still not very resilient. What happens when one of the machines goes down? We effectively lose all the cached data on it for the period that it's down, this is OK when the data is actually stored in the database but when the cache is the only place it's stored it's a bit more difficult. PHPDance addresses this issue with redundant writes. If you ask it to be redundant it will write to two servers every time you set and then, if the first one is down when you get it will try the second. It's painfully easy to switch this on, just set the first and only parameter to the Sharedance constructor to true.


$cache = new Sharedance( true );

You should note that redundancy can only work if you have enough servers. If, for example you only have one server, you obviously cannot enable redundancy. Also, if you have more than one server but one has a weighting greater than the total number of servers it will not work either.


Extend And Improve

Earlier I mentioned that a typical example of when something like this is useful is session management. PHP provides a function for setting custom session handlers (session_set_save_handler()) which has been implemented in SharedanceSession to create a distributed, redundant, Sharedance backed session handler for PHP5.

Check it out at http://sourceforge.net/projects/phpdance/