Pig is Yahoo!'s data flow language that is designed to run atop hadoop. I've spent a few hours today getting it set up and running. One thing I would like to point out is that you can't run the pig script in the bin directory (or at least not and connect to the hadoop cluster).
I had to manually run:
java -cp pig.jar:$HADOOPSITECONFIG org.apache.pig.Main
Also, if you dump a variable, it has to run the map phases to get to it. I thought it would just tell the schema, but no....
Wobble
1 day ago
No comments:
Post a Comment