Wednesday, May 21, 2008


Pig is Yahoo!'s data flow language that is designed to run atop hadoop. I've spent a few hours today getting it set up and running. One thing I would like to point out is that you can't run the pig script in the bin directory (or at least not and connect to the hadoop cluster).

I had to manually run:
java -cp pig.jar:$HADOOPSITECONFIG org.apache.pig.Main

Also, if you dump a variable, it has to run the map phases to get to it. I thought it would just tell the schema, but no....

No comments: