Sunday, January 18, 2015

Hadoop Pig Latin

Star Expressions:- 

Its is similar to SQL * operator.  to select and display all the columns in the table (with data) 
 In the above script we are loading a file into the relation Athletes which has total 8 columns separated my comma delimiter.
The statement A = FOREACH Athletes GENERATE * ; DUMP A is equal to
SELECT * FROM TABLE;

To display data in the pig grunt shell one should use DUMP command;
For the above example it will be DUMP A;

There are some restrictions in the usage of Star expression Lets see what are they.
1) For GROUP/COGROUP, you can't include a star expression in a GROUP BY column.

2) For ORDER BY, if you have project-star as ORDER BY column, you can’t have any other ORDER BY column in that statement.

How to use ORDER BY  Clause in PIG Latin.
Lets see the above script result.
we can see NULL values are returned last (as we used descending order).

How to do multiple columns order by in Pig Latin.
Ex:- 
Step 1:-  Load the data from HDFS to the relation Athletes
Athletes = LOAD 'hdfs://localhost:8020/user/OlympicAthletes.csv' USING PigStorage(',') as (athlete:chararray, country:chararray, year:int, sport:chararray, gold:int, silver:int, bronze:int, total:int);
STEP 2:- Perform descending order by on country, year on the relation 'Athletes'.
var = ORDER  Athletes BY country, year DESC ;
Step 3:- dump the result of step, so that we can see the end result on grunt shell.
DUMP var;



No comments :