3/16/2012

Manually "resetting" Mnesia to bring up RabbitMQ again

Recently, a colleague of mine has faced a pretty weird problem with RabbitMQ. He doesn't do Erlang - he only needs to run a RabbitMQ cluster (though I really think it isn't possible to reliably run systems like RabbitMQ or Riak without digging deeper into Erlang/OTP).

After he took another node into the cluster and took it off again, suddenly Rabbit didn't start on the cluster nodes. It kept dumping crash dumps and filling up the error log. So he asked me for help. By myself, I didn't do RabbitMQ - I just played with it a little. But I did quite some Erlang (http://www.amazon.de/dp/3941841459).

The crash dump didn't help much. Google didn't as well. A brief glance at the error log revealed a message like:


** FATAL ** Failed to merge schema: Bad cookie in table definition


So for me, it looked like Mnesia backing Rabbit has become inconsistent at some point, apparently through taking on and off another node. Whatever the cookie problem was, all nodes in the cluster and the questionable node shared the same Erlang cookie.

So my idea was to "connect" direcly to Mnesia and to clean up the schema. When Rabbit comes up, the schema would surely get recreated. Otherwise Mnesia would have to deal with its inconsistency and crash the VM all the time. Of course, it only works when you don't care about messages in the queues. Otherwise you would also need to backup some Mnesia tables used by Rabbit. I'm sure Rabbit documentation will mention those somewhere.

So, now back to how it worked. You need to find out where Mnesia stores its data for your Rabbit user. In my case, it was in /var/lib/rabbit/mnesia. Then, you bring up an Erlang node basically configured like the Rabbit node. And then you delete the Mnesia schema. After that, your Rabbit would be able to start, to create the Mnesia schema, and you can do rabbitmqctl stop_app etc. to reconfigure your cluster.

To "connect" to the Mnesia store, do something like this on every node you run Rabbit on. First, you fire up the prepared Erlang shell:


$ erl -sname "rabbit@node01" -mnesia dir '"/var/lib/rabbit/mnesia/rabbitmq"'
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5 (abort with ^G)
> mnesia:info().
===> System info in version ".....", debug level = none <===
opt_disc. Directory "/var/lib/rabbit/mnesia/rabbitmq/Mnesia.rabbit@node01" is used.
...


That's enough to know - the configuration you have provided is correct. If it says "NOT used", you did wrong and need to check the erl parameters. Then, still in the shell, you do:


> mnesia:delete_schema(['rabbit@node01'])
ok
>


Here you go. The Rabbit schema has gone. Now you can bring up your Rabbit and stop_app etc. to set up your cluster. As easy as this. You could try to list all your cluster nodes in the mnesia:delete_schema/1, but I didn't and it worked.

You're welcome. Feedback is very appreciated. What I also didn't do is to try just to delete Mnesia files - this might work as well, but to hard for me. You might need to start Mnesia on the node and to backup some tables, so I wouldn't suggest just to delete the files.

And I'm pretty sure, there is a magic Rabbit switch which allows to do all that in one single go. But I couldn't find any :)

3/12/2012

Why it's impossible to reliably count all records/documents/keys in a distributed data store

When you are not familiar with the theoretical aspects of distributed systems, but are using or going to use a distributed data store of whatever kind (sharded MongoDB, Riak etc.), you still need to understand some aspects. Then you won't wonder if some of the things you're used to when building on a single-instance or central data store without any replicas suddenly don't work reliably or are being told to be unreliable in the distributed world.

One of those aspects is the global record/document/key (however your store calls it) count. Let's agree on item count as a term. In a distributed data store with active replication it's simply impossible to always have a reliable information how many items your store keeps.

In order to better understand why it is so, imagine a global population census. Let's start in one single country. They delegate the counting process to single towns, then districs and so on. This is nothing that can be done within a millisecond. It takes time. They collect the local information during the counting days and send it upwards and so on. After some weeks, they are done.

Now consider the world wide scenario. It will even take much longer to collect all the local country counters at one central place. But anyway, one day this information is there.

Is the count of people on the planet reliable then? Imagine that while they are counting, people get born, move from one country to another and, I'm sorry, die. That means that what was possible is to have a global snapshot of the world population, each part of this snapshot is snapshot itself and isn't worth the paper it's written on after it has been done - so much has changed. People have been double counted or just didn't pop up for whatever reason.

But for the global scenario it's still ok to have a guess, with some tolerance. That's how your distributed data store would count the items it stores.

But it's still a stupid idea to count them. Remember the population census - counting people world wide is a very expensive and time consuming process. Exactly the same problem would face your distributed data store when you start counting its overall items.

And even when you decide to do so, it's still a guess - you only can do this asynchronously and unreliably, as a bunch of snapshots of different reliability and quality. Some nodes in the distributed system can go down or become unavailable (what normally doesn't happen to countries or cities) at the moment you try to count the items.

So, the only way to get reliable snapshots (only from available nodes) is to do it synchronously. But this is a much more stupid idea than to count the items at all. And when they want to synchronously count people on Earth, they need to plug GPS transmitters into every single body. And still, it doesn't solve the problem of having no GPS connection or just people getting born but still being unregistered. As well as, I'm sorry again, people dying.

Anyway, I hope I could help to understand the problem. Any feedback is welcome.