8/10/2015

I moved on, here's where and why

I'm writing like one post every 3 years, mostly because after having written quite a lot, I'm simply tired of writing. But this is a personal post, and this is my personal blog.

For now several months I've been working on Instana. We have exited "stealth" last week. Besides the official messages, here are my personal reasons why I moved there:

  • co-founding an international company
  • starting up essentially at zero
  • forming and building a disruptive product
  • facing big technological challenges
  • forming and working in a great engineering team
  • broadly using existing experiences
  • learning a lot
There are more smaller reasons, but these ones pretty much nail it.

I'm happy, excited and keen at once. A lot of work and challenges ahead, but this is exactly what I'm into.

4/27/2012

riak_mongo - making Riak act like MongoDB

Being in SF for the Erlang Factory this year, I've been listening to Andy Gross speaking about riak_core abstraction and how different sorts of distributed applications can be built upon it. Even during his talk I asked him if, having such a flexible architecture, Basho has ever considered to implement a MongoDB clone on top of Riak. They didn't, so I've started hacking on riak_mongo in the same night in the conference hotel. Very early in the beginning Kresten thrown himself on the project and gave it a crucial push.

The code is on GitHub.

So, what's riak_mongo? The idea is simple:

  • create a layer on top of Riak which would speak Mongo's TCP based protocol (Wire)
  • translate commands and queries into Riak's equivalents and run them, if necessary or applicable, against the K/V store
  • try to store objects in the K/V store the way that both, Mongo and Riak can benefit from

Existing Mongo clients that speak the Wire protocol, even including the Mongo shell, will connect and speak to a Riak node without even knowing it's Riak behind the protocol.

Why implement something like that? There is a bunch of possible reasons, driven by possible users' needs:

  • MongoDB users can seamlessly switch to a Riak backend. They get a Dynamo style distributed store with different storage strategies, being very reliable and almost constantly predictably scaling
  • MongoDB users who need to migrate to Riak have a migration help. There would be no need to modify the existing client code to be able to start
  • MongoDB users can start experimenting with reliability and eventual consistency and C/A knobs still having P as a given, and thus learn about the Dynamo approach before actually migrating to Riak
  • One side effect for Riak users is to comfortably make ad-hoc queries in the Riak store
  • ... you might find further reasons to use it ...

How does this technically work? Well, you can find all the details in the code. For some basic overview:

  • riak_mongo speaks Mongo Wire Protocol. That's the one Mongo drivers speak with the process they connect to
  • Wire protocol carries BSON, the binary JSON if you like. Internally, riak_mongo decodes this to Erlang terms to better deal with
  • Since there is no equivalent for database in the K/V store, Mongo's "db.collection" pair is used as bucket name for objects in the K/V store
  • what actually gets stored in the K/V store, is also BSON, even encoded is "application/bson"
  • querying is done using pipelined mepred in the K/V store. The query coming in gets encoded first for easier processing in Erlang
  • cursors are implemented through separate Erlang processes keeping the query results for their lifetime
  • riak_mongo is an OTP application which should be loaded into the same nodes you run Riak on
  • riak_mongo accesses the K/V store through the local low level Erlang API, so there is no waste for translation or explicit networking at this point
  • also "application/json" encoded objects can be queried from the K/V store through the same path

What can also be done later: a MongoDB K/V backend. This would also have some interesting aspects:

  • People who need to migrate to Riak could access their existing data in the MongoDB without explicit data migration
  • It can even be used to migrate data, for example when only one node in the group would be backed by MongoDB. During the background data replication, data can be seamlessly and asynchronously migrated to "real" Riak nodes. Auto-sharding might be a challenge though
  • One can think of combining the protocol adapter with the MongoDB backend. This way around, the user can use Riak's distribution functionality only, leaving the client and the backend as is. Existing auto-sharding stores might be a challenge here as well
  • One can play with C/A knobs having existing data
  • ... whatever you like ...

The project is still work in progress and doesn't introduce all possible features yet. Right now it's possible to connect to the store through a MongoDB client (we tested it with the mongo shell) and fire some basic commands and queries, like insert, update or findOne. And of course, optimizations still have to be done. We will also provide  a test suite on different levels. Now, it's a good basis for research and it also might be an interesting option for users to migrate step by step.

Any constructive feedback except ranting/bashing is welcome. We also appreciate any hands-on help.

If you want to contact developers:

Pavlo Baron (pb at pbit dot org)
Kresten Krab Thorup (krab at trifork dot com)

3/16/2012

Manually "resetting" Mnesia to bring up RabbitMQ again

Recently, a colleague of mine has faced a pretty weird problem with RabbitMQ. He doesn't do Erlang - he only needs to run a RabbitMQ cluster (though I really think it isn't possible to reliably run systems like RabbitMQ or Riak without digging deeper into Erlang/OTP).

After he took another node into the cluster and took it off again, suddenly Rabbit didn't start on the cluster nodes. It kept dumping crash dumps and filling up the error log. So he asked me for help. By myself, I didn't do RabbitMQ - I just played with it a little. But I did quite some Erlang (http://www.amazon.de/dp/3941841459).

The crash dump didn't help much. Google didn't as well. A brief glance at the error log revealed a message like:


** FATAL ** Failed to merge schema: Bad cookie in table definition


So for me, it looked like Mnesia backing Rabbit has become inconsistent at some point, apparently through taking on and off another node. Whatever the cookie problem was, all nodes in the cluster and the questionable node shared the same Erlang cookie.

So my idea was to "connect" direcly to Mnesia and to clean up the schema. When Rabbit comes up, the schema would surely get recreated. Otherwise Mnesia would have to deal with its inconsistency and crash the VM all the time. Of course, it only works when you don't care about messages in the queues. Otherwise you would also need to backup some Mnesia tables used by Rabbit. I'm sure Rabbit documentation will mention those somewhere.

So, now back to how it worked. You need to find out where Mnesia stores its data for your Rabbit user. In my case, it was in /var/lib/rabbit/mnesia. Then, you bring up an Erlang node basically configured like the Rabbit node. And then you delete the Mnesia schema. After that, your Rabbit would be able to start, to create the Mnesia schema, and you can do rabbitmqctl stop_app etc. to reconfigure your cluster.

To "connect" to the Mnesia store, do something like this on every node you run Rabbit on. First, you fire up the prepared Erlang shell:


$ erl -sname "rabbit@node01" -mnesia dir '"/var/lib/rabbit/mnesia/rabbitmq"'
Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5 (abort with ^G)
> mnesia:info().
===> System info in version ".....", debug level = none <===
opt_disc. Directory "/var/lib/rabbit/mnesia/rabbitmq/Mnesia.rabbit@node01" is used.
...


That's enough to know - the configuration you have provided is correct. If it says "NOT used", you did wrong and need to check the erl parameters. Then, still in the shell, you do:


> mnesia:delete_schema(['rabbit@node01'])
ok
>


Here you go. The Rabbit schema has gone. Now you can bring up your Rabbit and stop_app etc. to set up your cluster. As easy as this. You could try to list all your cluster nodes in the mnesia:delete_schema/1, but I didn't and it worked.

You're welcome. Feedback is very appreciated. What I also didn't do is to try just to delete Mnesia files - this might work as well, but to hard for me. You might need to start Mnesia on the node and to backup some tables, so I wouldn't suggest just to delete the files.

And I'm pretty sure, there is a magic Rabbit switch which allows to do all that in one single go. But I couldn't find any :)

3/12/2012

Why it's impossible to reliably count all records/documents/keys in a distributed data store

When you are not familiar with the theoretical aspects of distributed systems, but are using or going to use a distributed data store of whatever kind (sharded MongoDB, Riak etc.), you still need to understand some aspects. Then you won't wonder if some of the things you're used to when building on a single-instance or central data store without any replicas suddenly don't work reliably or are being told to be unreliable in the distributed world.

One of those aspects is the global record/document/key (however your store calls it) count. Let's agree on item count as a term. In a distributed data store with active replication it's simply impossible to always have a reliable information how many items your store keeps.

In order to better understand why it is so, imagine a global population census. Let's start in one single country. They delegate the counting process to single towns, then districs and so on. This is nothing that can be done within a millisecond. It takes time. They collect the local information during the counting days and send it upwards and so on. After some weeks, they are done.

Now consider the world wide scenario. It will even take much longer to collect all the local country counters at one central place. But anyway, one day this information is there.

Is the count of people on the planet reliable then? Imagine that while they are counting, people get born, move from one country to another and, I'm sorry, die. That means that what was possible is to have a global snapshot of the world population, each part of this snapshot is snapshot itself and isn't worth the paper it's written on after it has been done - so much has changed. People have been double counted or just didn't pop up for whatever reason.

But for the global scenario it's still ok to have a guess, with some tolerance. That's how your distributed data store would count the items it stores.

But it's still a stupid idea to count them. Remember the population census - counting people world wide is a very expensive and time consuming process. Exactly the same problem would face your distributed data store when you start counting its overall items.

And even when you decide to do so, it's still a guess - you only can do this asynchronously and unreliably, as a bunch of snapshots of different reliability and quality. Some nodes in the distributed system can go down or become unavailable (what normally doesn't happen to countries or cities) at the moment you try to count the items.

So, the only way to get reliable snapshots (only from available nodes) is to do it synchronously. But this is a much more stupid idea than to count the items at all. And when they want to synchronously count people on Earth, they need to plug GPS transmitters into every single body. And still, it doesn't solve the problem of having no GPS connection or just people getting born but still being unregistered. As well as, I'm sorry again, people dying.

Anyway, I hope I could help to understand the problem. Any feedback is welcome.

12/29/2011

Debugging erl with gdb in Emacs

If you want to debug erl(exec) and use Emacs, you need just a few tricks. This short post is based upon @dizzyd's hint: http://dizzyd.com/blog/post/190

Copy bin/erl into e.g. bin/erldbg and edit this script. Change:

exec $BINDIR/erlexec ${1+"$@"}

to

gdb --annotate=3 $BINDIR/erlexec --args $BINDIR/erlexec ${1+"$@"}

Now fire up Emacs, do "M-x gdb" and use your bin/erldbg as gdb runner. Now you have an erlexec under debugger and can do whatever you do debugging.

To start the shell, just do:

(gdb) r RETURN

Etc. etc. etc. You're welcome.

5/07/2011

Big Data - Demo For JAX2011

Long time, no blog...

Well, it's not really a new blog post but a more detailed description of the demo I've shown at the JAX2011 conference in May. It's about big data - here is the preso.

After having explained some theoretical stuff I have briefly introduced some technologies as well as corresponding open source products using a couple of typical questions/use cases one could have dealing with big data and searching for a way to tame it.

I had a demo setup with me to show which use case can be solved how using which technology. The demo itself is extremely overdone, but should show how a very complex, hypothetic scenario could be implemented using appropriate technologies. Please find the whole demo code here.

"jhtsrv" contains the very basic implementation of the front end web server as well as the Esper and Riak abstractions for the on-going data collection. "repl" contains an own Hadoop map-reduce job as well as the corresponding Swift and Riak abstractions. "stat" contains an R-script for plotting/statistics. "soapui" contains a SoapUI project to stress the web server.

Now enough intro. What was the goal? Imagine an attempt to do a global, world-wide voting. Let's say, we vote for: "should we escape to Mars real fast?". We want the web-based voting platform to be very fast, so we need minimal latency for the very and only operation this platform provides: based on an IP address of the user, to store a YES or a NO. Ok, this is real stupid since big organizations might be filed through one single proxy IP and we can't distinguish single people behind it, but hey - it's a hypothetic demo, ok? So we ignore this issue.

We want to alert a responsible person in a region when within some time a configured number of negative votes appears. Alerts must happen as data comes in, not later.

Regardless of whether the vote is negative or positive, we need to file it as reliably as possible, but we would accept some faults as long as users can vote almost without latency using their smart phones, browsers etc.

Once a day, we want to analyze the votes statistically in our headquarters. For this, it's sufficient to have a snapshot of what happened till then. But we need the data to be available for analysis real fast, so also no considerable latency can be accepted for data preparation.

What we do after statistical analysis is to create a world map with green points for positive and red points for negative votes. We translate IP adresses into corresponding geographic locations for this.

Ok, yes, it would be a useless graphical mess with colors overlapping each other. But as I already wrote - it's an experiment, a demo.

Now. How would we build this? First of all: we need to devide this global platform into regional data centers responsible for a piece of the whole work. Will be to expensive? Well, with a global question like this money wouldn't play a big role anymore. But we definitely need geographic proximity of users to their point of voting in order to have minimal latency. Thus, we need to go to the internet edges like CDNs do.

The shorter the network distance and the less infrastructure stuff on the way, the less latency: less fault handling, lost packets, timeouts, waiting etc.

The more independent servers not sharing a bit of data or a critical computing resource, the less latency: no load on an overloaded centralized infrastructure, scalability directly at the connection point etc.

The other reason why we would go for a regional data center is to crunch this mega huge rolling stone of data into smaller pieces. We would store smaller data sets locally in the regional data centers and find a way to fit them together later.

And generally: the less danger of global disaster such as the central data center being unavailable for all users, the... Well, this explains itself.

Ok, the tiny web server I use in the demo is a joke. But I hope the reader can abstract and imagine a whole group of web servers. There is no state to be managed, no dynamic content or such. Requests can be balanced between several web servers easily, because the technologies we use behind would be no bottle necks or single points of failure - we can scale them out as well. It's a distributed data store (call it a NoSQL database if you like, but I won't) and a stream processor which can be centralized or decentralized, cache or database backend etc. - depending on the use case.

So far, so good. Now how would we write votes in the regional data centers almost reliably but immediately and extremely fast while checking the same data for negative votes within some period of time?

First, we need a distributed (cluster) data store to store our data on several nodes redundantly. Second, this data store must provide sloppy write quorum, and we wouldn't want to wait for the confirmation of a durable write on any of the cluster nodes. So yes, we take Riak as a real cool Dynamo's implementation.

But before we throw the data on Riak, we asynchronously push it into the event stream of Esper. We do some CEP there in order to alert on N negative votes within X minutes. Ok, we are a little bit inaccurate - we don't consider the global votes in the threshold calculation, but only in one data center. So, many alerts wouldn't get fired when in the past 30 seconds in Germany and in the USA we had 120 negative votes. But it's the price we pay for decentralization, and it must be an acceptable one. And more than that: whom to alert when we pass a threshold in two countries? It's getting more and more unnecessarily complex - let's stick with data center borders.

So, Esper would check the data as it comes in - ideally without any storage, right from the memory or a distributed memory cache when we have a scenario with several web servers in a regional data centers, which is very likely. Esper has an HA option/package, which I didn't consider for the demo though and which is able to persist streams and to distribute them so different engines on different nodes can pull events out kind of coordinated. One can still consider a queue for this. But let's move on assuming that Esper does what it does - CEP, maybe on several machines. On the fly.

Now back to Riak. The sloppy write quorum with 3 nodes storing each vote could be 1. We only need a confirmation from one node, the rest Riak should do gossip behind the scenes, while we aren't waiting anymore. And this node does only need to say: "I have the record". Not "I have stored the record". Such setup could be weak since some records could really get lost, so we could consider quorum of 2. Or we could think: 1 is ok, when the node crashes, another one would take over the data, so what we lose is minimal. When we want that 6 billion people vote real quick, a couple of votes can really get lost without changing the whole result.

Now we have the data locally and we have alerts. What about the daily statistics? First of all, we need to get the data somehow from the regions to the headquarters. We need a sort of data warehouse there, right? But how do you collect the data from thousands of small data centers all over the world, while it's still getting written into the stores? How can you move the data fast enough to one data center?

But why one data center? The requirement was to access the data from the headquarters, but not to have it in its data center. So what would we do? We push the votes from data centers to a cloud based data store on the daily basis. We can access the cloud data store from everywhere, including data centers and headquarters, so.

So, we have a Riak store in the data center and need to move its current data to the cloud while data still comes in. For this, we would use the fact that we have a distributed data store. But the real weak point in the whole setup is that we are reading data while it's getting written, and then we write (delete) it while it's still getting written. A very careful testing and setup is necessary for this to work properly, if it really works at all.

First step is to get the snapshot - a "frozen" collection of keys to work with. Everything that came in after we pull the snapshot will be outside the cloud replication/load run.

Having the snapshot, we map-reduce the replication job. At least we do map, since there is not much to do in the reduce phase, and we use Hadoop as the framework for that. It can use a cluster of nodes to devide such a big job into smaller subtasks running on a data split, so we need to do a custom split as well as custom read format which can be seen in the demo code.

Running on several nodes, our mapper aggregates votes into groups of votes to prepare larger objects for the cloud store (it doesn't pay out to store small objects like single votes in the cloud, and we would never need to access them separately). We read a stored vote from Riak and delete it afterwards, but also with a sloppy delete quorum, so we don't really have to wait for the whole store to be in a consistent state conserning this one vote. Again, we are dealing with so many records and have quite fuzzy statistical accuracy - we are allowed to be a little inconsistent and to miss records or to have duplicates.

The mapper reads on one hand and writes on the other - directly to the cloud using its REST interface. To simulate cloud, we use Swift (OpenStack Object Storage). Its implementation is similar to Dynamo, and I guess it would also be able to provide eventual consistency and sloppy quorum when storing objects (though I didn't find a way to configure this from the client. Maybe it's done on the container basis, I didn't look further). We need such a store as cloud based store in order to scale as expected. Though, if we would lose aggregated vote groups pushing them sloppily into cloud, this would hurt much more than with single votes - here, we need to be much more careful.

Now, the data is in the cloud once a day. What's next? Now we take R and do some statistical stuff to analyze it. For example, in the demo I just have an "algorithm" to calculate big city coordinates from the user's IP address instead of doing some IP/geo location. Why? Because I did the demo in a conference and didn't know if the network connection would work. Normally, we would take the IP address, find out where it belongs and try to plot this point on the world map.

That's it. A simple demo. Sure, it was possible to solve it much easier. Sure, it's hypothetic. Sure, it would cost to much to build it all in the real world. And of course, some of my assumptions and simplifications are completely or partially wrong. But the whole thing also runs on one notebook with Ubuntu, so why not give it a try and think further/golbally? I hope I could show some use cases for some cool technologies and a way of thinking in big data scenarios. Well, to some point, of course.

I would appreciate any pointers/comments to/on where the thoughts or the implementation don't fit the common sence or the science/experience.

Thank you for reading.

11/21/2010

Is MySQL the new SPAM of the modern?

Do you remember this wonderful SPAM sketch by Monty Python Flying Circus?

They say, that's why they call the emails we don't want to receive the SPAM. However, here is a nice story which illustrates what I wat to say in this blog post.

Recently, a former colleague told me he's in a project as architecture consultant. There is also a customer and a consulting company who will do all the implementation. A triangle. So, the customer wants a thing to be implemented where they would have many information nodes which would be linked together based upon different criteria, and you never know in advance how long the path between A and B would be. The primary requirement is to later find the shortest path from A to B.

Sounds simple, doesn't it? Or not? Well, this is a classic use case for graphs, if you didn't have to deal with such scenarios before. And usually it is very difficult or almost impossible to save such paths using a relational database, at least if you want to find the shortest path real quick. You would need to read all the data into the memory to be real fast, and if there is real big data you have to deal with, you simply have no chance to do so.

That's why some real smart guys have implemented graph databases such as Neo4j. The database has exactly one group of use cases it deals with and nothing else. It doesn't try to rule the whole world - it just does it's job well.

Now, the triangle I've mentioned earlier comes together and discusses the future solution.

The architecture consultant says: you should do it with a graph database since the data is huge and it's going to grow unpredictably.

The customer has no idea about all that but needs two opinions to pick the cheapest one.

The consulting company says: we take MySQL. It's cheap, it's mainstream, we are experienced in it, we can do it in 5 days. But we would have to limit the path length to 5 nodes per path.

Now think back to the SPAM sketch and compare :)

"Do you have anything without MySQL in it?" - "Hm, we have a solution with just a little MySQL in it" - "But I don't want MySQL!" - "MySQL, MySQL, MySQL, MySQL......"

6/06/2010

"Fragile Agile" Submitted To The Publisher

We have punctually submitted the manuscript of "Fragile Agile" to the publisher (Hanser) and now are waiting for the first review. The book is great, I must say ;)

The official release date is Sept. 2nd 2010. The book can already be pre-ordered on Amazon: Fragile Agile @ Amazon

Stay tuned!

5/15/2010

HearItLater - An Audio Helper For A Lazy Dog

I've blogged about what I wanted to do: An Audio Helper For A Lazy Dog

Well, I have implemented this thing. I call it HearItLater, but just for me - I have no motivation to market it. Here are some details, and of course some code to play with.

First of all, what you need:

  • an account at ReadItLater (http://readitlaterlist.com). I am a fan of this little helper, so I have built this little helper around this other little helper for me
  • some active entries at ReadItLater - I always have some, so I just don't check for if the current list is empty
  • an own appkey at ReadItLater - I have got my own for my own purposes
  • Windows box to run the unchanged script. The script was developed on Windows 7, feel free to migrate it in whatever direction
  • ActivePerl 5.10.1 for Windows. I used the 64bit version without any extensions except the auto-downloaded ones - out of the box
  • eSpeak (http://espeak.sourceforge.net) - the TTS software I used to produce WAV files out of the text files
  • a lot of motivation and patience to accept a silly robot's voice
Ok, now how it works:

  • it reads and iterates current your ReadItLater list
  • for every url in the list - if it's HTML - it downloads the content
  • it strips the content to the plain text
  • it creates a WAV file out of the stripped text
What are the known issues:

  • sometimes, eSpeak crashes for whatever reason
  • the reading quality is bad. Well, you understand it, but since the eSpeak command line tool doesn't use MS SAPI, it's really bad - for me. But much better, than nothing. And you can exeperiment with the parameters to find the adequate speed and pauses and so on which better fit your ear, as well as voices
  • I didn't invest much time in error checking in the script, so expect some surprises
  • the HTML page gets stripped completely, that means that every single link text and so one gets read without any system or concept. That's the trade-off
  • I don't take over original file names which could be extracted from the url. I just count the list items from ReadItLater in a loop and use the counter to name the files
  • Some WAV files are corrupted, I don't know why yet

Good, the rest seems to work. And here is the code - replace the placeholders (|...|) with your own corresponding stuff and enjoy the acoustic channel full of B-movie robot voices hammering unsorted information into your head while you do your normal job:



require HTTP::Request;
require LWP::UserAgent;
use JSON;
use HTML::TreeBuilder;
require HTTP::Headers;

$appkey = '|readitlater appkey|';
$user = '|readitlater user|';
$pass = '|readitlater password|';
$baseurl = "https://readitlaterlist.com/v2/";
$basepars = "?username=$user&password=$pass&apikey=$appkey";
$out = "|output path|";
$espeak = "|espeak install dir|\\command_line\\espeak.exe";
$espars = "-v en-us+f2 -s 180 -g 10mS";

$json = readitlater("get");
@urls = @{processJSON($json)};
$i = 1;
foreach $url (@urls) {
$txt = page2text($url);
text2wav($txt, $out, $i);
$i = $i + 1;
}

# readitlaterlist.com API connector
sub readitlater {
local $fun = $_[0];

return typed_wget("$baseurl$fun$basepars", '?'); #'?' = doesn't matter which content type
}

# wget, which can control the content type
sub typed_wget {
local ($url, $ctype) = ($_[0], $_[1]);

print "wget $url\n\n";

$request = HTTP::Request->new(GET => $url);
$ua = LWP::UserAgent->new;
$response = $ua->request($request);
if (($ctype eq '?') or ($response->headers->content_type eq $ctype)) {
return $response->decoded_content;
}
else {
return ''; # empty content if error of any kind
}
}

# JSON processor
sub processJSON {
local $json = $_[0];

$perl = decode_json $json;
%hash = %{$perl};
$list = $hash{'list'};
%hash = %{$list};
@urls = ();
foreach $key (%hash) {
$val = $hash{$key};
%hash_sub = %{$val};
push(@urls, $hash_sub{'url'});
}

return \@urls;
}

# process an online document - wget it, but only if it's HTML and strip it to the plain text
sub page2text {
local $url = $_[0];

$content = typed_wget($url, 'text/html');
$tree = HTML::TreeBuilder->new;
$tree->parse($content);
$stripped = $tree->as_text();

return $stripped;
}

# save the stripped text to a text file and use eSpeak to create a WAV out of it
sub text2wav {
local ($txt, $out, $cnt) = ($_[0], $_[1], $_[2]);

$of = "$out\\f$cnt.txt";
open _F, ">$of";
binmode(_F, ":utf8");
print _F $txt;
close _F;

$cmd = "$espeak -f $of -w $out\\f$cnt.wav $espars";
system $cmd;
unlink("$of");
}


5/14/2010

An Audio Helper For A Lazy Dog

If you, like me, don't have enough time or eye focus to read all the blogs and online articles flying around, but really would like to get this information into your head, you will quickly look for audio helpers. Our ears are "free" most of the time, and the corresponding sense is free, too. Why not use this channel?

For me, it works. It's like background, it's there. Maybe not 100%, but more than nothing. And in most of the cases it's ok. And if I need more and it's really interesting, I will come back to it late and read it.

So, I wanted to get myself a Nabaztag, but I won't - it's a toy.

So, my new idea is this, and I'm sure it's not new, but I don't care: I will take eSpeak from here: http://sourceforge.net/projects/espeak/. I will write a Perl script which will take my web pages of interest, for example from my ReadItLater list (if I can connect to it), and create an MP3 over a WAV out of them. Then, I'll throw those MP3s onto my iPhone and I have my acoustic channel - web pages spoken by a robot, just for me. I can even avoid iPhone and stay with the WAVs on my desktop, whatever.

So, that's it. An audio helper for a lazy dog.