Last Updated: December 26, 2018
·
2.169K
· andersbrownwort

Apache Mahout: driver.MahoutDriver: Unable to add class: WikipediaXmlSplitter

While running the Mahout Wikipedia examples, I found I had to call the example applications by full path:

$MAHOUT_HOME/bin/mahout org.apache.mahout.text.wikipedia.WikipediaXmlSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

rather than just WikipediaXmlSplitter. Also be aware of case - the docs currently list wikipediaXMLSplitter.

2 Responses
Add your response

THANK YOU!!! That solved one of my hurdles, but it was quickly replaced by another.

By any chance did you run out of Java heap space? I'm running this on EC2 on an m1.large instance and I've upped the MAHOUT_HEAPSPACE up to 5G but I'm still getting java.lang.OutOfMemoryError: Java heap space error. (Full message here: http://pastebin.com/P5PYuR8U)

over 1 year ago ·

Sorry, I'm not sure on that. As long as you set -Xmx (which it seems you have) you should be good. (could also try setting -Xms to the same value) Maybe there is a memory leak in your version of the code? Has it been updated lately?

over 1 year ago ·