This tutorial mirrors the Pythonic example of multifetch, but accomplishes the same task using the Hadoop Java API.
Same as for the Pythonic example.
Again, same as the Pythonic example, except in Java.
View the source code for MultiFetch.java (opens in new window).
cat task_${TASKID}_m_*/stderrWhere ${TASKID} is the id of the task found from the mapreduce console output or the web interface.
mkdir multifetch_classes javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar \ -d multifetch_classes MultiFetch.java jar -cvf $HOME/proj/hadoop/MultiFetch.jar -C multifetch_classes/ .
Input URLs into the DFS in the same way described in the Pythonic example.
bin/hadoop jar $HOME/proj/MultiFetch.jar \ edu.brandeis.cs147a.examples.MultiFetch \ urls/* \ titles
Set up a real Hadoop cluster, or go back to the Python version of the example.