January 22, 2007

Accessing Sharedance With PHPDance

Well, I've come a little late to this blogging thing, partly because I haven't had a huge amount to say which I think would be of much use to others. I still may not, but let's see.

This post, apart from starting with a short introduction, is about PHPDance, a PHP interface to the Sharedance cache server which I wrote a couple of months ago. Sharedance is a distributed object cache much like Danga's Memcached, except that while Memcached only saves data to memory, Sharedance also writes it to the hard disk.


Just what is a distributed object cache?

Well, an object cache is a tool which allows you to store arbitrary data, referenced by a key and then retrieve it by that key at a later date. The distributed bit refers to how and where that data is stored. In a local object cache such as APC the data is cached directly on the local machine, whereas a distributed cache spreads it's data across multiple machines. There are two main reasons for wanting to do this.


  1. Performance and making efficient use of the available resources. Imagine you have four webservers, each with 4Gb of memory (2Gb of which is reserved for the cache), and that you are using an object cache. Unless you have divided your application across the four servers (ie users.example.com, shop.example.com, review.example.com, payment.example.com) which is usually not possible, all four of your servers will be serving up roughly the same data. The effect on the cache is that pretty much the same data is being cached across each of the four servers giving you effectively 2Gb of memory for your application. However, if you could spread the cache across the four servers you could increase the amount of available cache space by four times making 8Gb of cache available. This side of things is covered very well by Memcached, with it's incredible speed.

  2. Avoiding the database all together, which requires resilience. Most of the time you'll be caching data which can be found in another source such as the database. However, there are situations where you don't want to have to store data in the database. When there are likely to be a large number of writes involved and there won't be much benefit gained. Session data is a very good example of this. It would be dangerous to store session data in memcached only as if the cache gets full old session data will fall off the cache. In this situation something like Sharedance would be more appropriate as data is also saved to the hard disk so old session data is not lost (unless it is given an expiry or is explicitly deleted).

Enter PHPDance

PHPDance provides a clean, object oriented, PHP5 interface to Sharedance. Let's take a look at an example for a setup similar to that mentioned earlier, four webservers each running an instance of the cache server (in this case Sharedance).


require 'sharedance.class.php';
$cache = new Sharedance();
$cache->addServer( new SharedanceServer( 'web1.example.com' ) );
$cache->addServer( new SharedanceServer( 'web2.example.com' ) );
$cache->addServer( new SharedanceServer( 'web3.example.com' ) );
$cache->addServer( new SharedanceServer( 'web4.example.com' ) );

$key = 'mykey';
$data_in = 'some data which needs to be cached';
$cache->set( $key, $data_in );

$data_out = $cache->get( $key );

Here we create a new instance of the Sharedance object which acts as our gateway to the cache as a whole and then add instances of SharedanceServer for each of the machines which we want involved in this cache. Note, this must be exactly the same everywhere the cache is being used. Then we set some data to the cache and then get it back again with the set and get methods. What happens under the hood is the Sharedance object determines which server the data should be cached on and then caches it there (this is why even the order of the servers must be the same everywhere this is used).


Building In Redundancy

This is all well and good, and the fact that the cache is written to the disk as well is great because it means we don't have to worry about overflowing the cache however it's still not very resilient. What happens when one of the machines goes down? We effectively lose all the cached data on it for the period that it's down, this is OK when the data is actually stored in the database but when the cache is the only place it's stored it's a bit more difficult. PHPDance addresses this issue with redundant writes. If you ask it to be redundant it will write to two servers every time you set and then, if the first one is down when you get it will try the second. It's painfully easy to switch this on, just set the first and only parameter to the Sharedance constructor to true.


$cache = new Sharedance( true );

You should note that redundancy can only work if you have enough servers. If, for example you only have one server, you obviously cannot enable redundancy. Also, if you have more than one server but one has a weighting greater than the total number of servers it will not work either.


Extend And Improve

Earlier I mentioned that a typical example of when something like this is useful is session management. PHP provides a function for setting custom session handlers (session_set_save_handler()) which has been implemented in SharedanceSession to create a distributed, redundant, Sharedance backed session handler for PHP5.

Check it out at http://sourceforge.net/projects/phpdance/

18 comments:

Dinh said...

Thanks, this post is really useful for me.

Unknown said...

This is great information. One question, though: you mention SharedanceSession, but I don't see that implemented anywhere in the PHPDance code. Is code for a SharedanceSession class available?

Thanks!

Ted Dunning ... apparently Bayesian said...

Take a look at Apache zookeeper for a reliable version of this same sort of idea.

See http://hadoop.apache.org/zookeeper/

Steven Roussey said...

Do you still use this?

Rob Young said...

I don't actually use it myself at the moment but I know of at least two current installations using it for session caching.

Anonymous said...

Have you any idea of performance compared with one single remote sharedance cacheing server?

I see the code, for each key, has to determine which server to use, when writing/reading data, so this part probably makes up most of the extra 'overhead' wrt using one remote server with sharedance alone being used.

Rob Young said...

Hi Alistair,
Thanks for the comment. I don't have any concrete stats but writes will be slower as they have to be sent to multiple servers. However reads will be quicker because they'll be spread across multiple servers. It will really depend on how your load is spread between read and write.

Anonymous said...

Thanks for quick reply- it may all even out then performancewise, but still has the added benefit of redundancy. One more thing, assuming I had 3 servers caching, with redundancy option set, and one of them failed. It appears that everything would carry on regardless at that point, and I could leave the 'server list' intact. If I then rebuilt the failed server, putting it back in production appears to be the problem, if there had been 'write' to it's alternative (next in list) server in the meantime. Still, it's something I can think about more perhaps. Overall still much better than one single caching server which fails!

Anonymous said...

I've just realised that your code could be a basis for offering simple redundacy for *anything* related to remote read/write. So could extend or re-write to handle remote mysql server redundancy (no hideous mysql slave issues, or master-master issues), or something adserver related which is more in my mind. Brilliant!!

Rob Young said...

Glad you think so Alistair, let me know if you do anything with it. I'd be very interested.

Anonymous said...

found one bug:
If I set up sharedance on two servers, and use these as the two 'redundant' servers, then the data is written on both servers, as expected.
If I stop either process, to simulate sharedanced failing on one server, loading my php script gives error:
PHP Fatal error: Uncaught exception 'SharedanceException' with message 'Failed to connect on etc..

so if the server is up, sharedanced is down, it's a no-go.

now, starting again, if I shutdown one server completely, which is another failover scenario (a more likelyone), I now get when loading script:

PHP Fatal error: Uncaught exception 'SharedanceException' with message 'Failed to connect on ..

so currently I'd have redundant writes, if nothing has failed, but an error when either sharedanced dies, or server dies.

Anonymous said...

Quick follow up to say I've now managed to get distributed sessions working perfectly. I made a couple of changes to get it working correctly for me, maybe simply related to my bad coding! I've simulated session server failing and everything continued to work as normal, with session reads from the other server (the 'other' being the one remaining which had stored that session of course).

Muhammad Rashid said...

hi,
I am trying it currently on 1 server only with following code

$session = new Sharedance();
$session->addServer( new SharedanceServer( 'myserver.com', 8989) );
$session->set( 'id', $user->getId() );

But i get Failed to connect on XXXXXXX "Connection Refused"


however if set true for redundancy for the same piece of code, it throws an exception

"Bucket rotation not allowed"


Can anybody tell me what can be the issue?

Thanks

Rob Young said...

Hi,
The failing to connect problem could be any number of things. Are you listening on that port on your server? Are there any firewall issues?
The second problem is that you can't have redundancy if only one server is involved, that would just be silly.

Muhammad Rashid said...

thanks for quick reply.

I know redundancy for only one server is good at all, but just to give you an idea that, If i set redundancy to true, it do connect but throws different exception then which is "Bucket rotation not allowed"

I am sure If am getting this exception, then I am connected to the server?

What do you think

Rob Young said...

No, the exception is thrown before connecting. It's a bit of basic logic validation that happens before anything else.

Muhammad Rashid said...

is their any tutorial that can help me understanding it thoroughly?

I am kind of stuck with this so want to study in detail

Anonymous said...

Muhammad

Looks like you're trying to use sessions. I got this working some time ago with a few code changes.
Anyway, if you've actually got
sharedance installed and running
you would be able to telnet, i.e.

telnet myserver.com 8989

and you'd get a telnet prompt.
If not, then you can't connect
at all, so isn't a phpdance issue.