Apache Mahout: driver.MahoutDriver: Unable to add class: WikipediaXmlSplitter
While running the Mahout Wikipedia examples, I found I had to call the example applications by full path:
$MAHOUT_HOME/bin/mahout org.apache.mahout.text.wikipedia.WikipediaXmlSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
rather than just WikipediaXmlSplitter. Also be aware of case - the docs currently list wikipediaXMLSplitter.
Written by Anders Brownworth
Related protips
2 Responses
THANK YOU!!! That solved one of my hurdles, but it was quickly replaced by another.
By any chance did you run out of Java heap space? I'm running this on EC2 on an m1.large instance and I've upped the MAHOUT_HEAPSPACE up to 5G but I'm still getting java.lang.OutOfMemoryError: Java heap space error. (Full message here: http://pastebin.com/P5PYuR8U)
Sorry, I'm not sure on that. As long as you set -Xmx (which it seems you have) you should be good. (could also try setting -Xms to the same value) Maybe there is a memory leak in your version of the code? Has it been updated lately?