Hadoop Streaming appears to be a way to write quick hadoop jobs. I've recently been playing with it and have finally gotten it to work for me.
The main parameter that I had to add was -jobconf stream.shipped.hadoopstreaming=$HADOOP_HOME/contrib/streaming
It was somehow getting set to /tmp which was causing everything in my /tmp directory to get added to the job jar it generates.
Another good thing to keep in mind is the -verbose flag. It can help figure out what is going on under the hood.
CodeSOD: Counting it All
17 hours ago
No comments:
Post a Comment