Wednesday, May 21, 2008

Hadoop Streaming

Hadoop Streaming appears to be a way to write quick hadoop jobs. I've recently been playing with it and have finally gotten it to work for me.

The main parameter that I had to add was -jobconf stream.shipped.hadoopstreaming=$HADOOP_HOME/contrib/streaming

It was somehow getting set to /tmp which was causing everything in my /tmp directory to get added to the job jar it generates.

Another good thing to keep in mind is the -verbose flag. It can help figure out what is going on under the hood.

No comments: