Apache Pig (Hadoop) Script + Ruby UDF
# test.rb
require 'pigudf'
require 'java'
class Myudfs < PigUdf
outputSchema "word:chararray"
def concat *input
input.compact.inject(:+)
end
end
# test.pig
register ./test.rb using jruby as myfuncs;
t = LOAD 'test.txt' USING PigStorage(',') AS (a:chararray, b:chararray);
v = FOREACH t GENERATE myfuncs.concat(a,b);
STORE v INTO 'output';
# test.txt
my, phone
any, home
$ brew install pig # OS/X
$ pig -x local test.pig
$ output/part-m-00000
Enjoy!
Written by usiegj00
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Hadoop
Authors
devtripper
37.77K
kh1ramatsu
8.981K
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#