51问答网 > 【Hadoop】请问如果我想把两个mapreduce程序顺序连接起来应该怎么写程序

【Hadoop】请问如果我想把两个mapreduce程序顺序连接起来应该怎么写程序

2025-02-23 05:42:03

推荐回答（1个）

回答1：

你可以自己设置输入输出路径，所以设置就行了。。。
example:
JobConf conf1 = new JobConf(YourClass.class);
//set configurations
...
//set inputformat
conf1.setInputFormat(SomeInputFormatExtendsFromInputFormat.class)
conf1.setOutputFormat(SomeOutputFormatExtendsFromOutputFormat.class)
//set input path
FileInputFormat.setInputPaths(conf1, "/your_input_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_output_dir");
JobClient.runJob(conf1);
//at this point, the job should have finished. Use submitJob(conf1) to submit it asynchronisely.
JobConf conf2 = new JobConf();
//do the same for conf2, except the input path
FileInputFormat.setInputPaths(conf1, "/your_first_output_dir");
FileOutputFormat.setOutputPaths(conf1, "/your_first_input_dir");
JobClient.runJob(conf);
自己继承InputFormat, OutputFormat来定义合适的分割，读，写文件方式。mapreduce有一些实现好的，比如FileInputFormat, SequenceFileInputFormat。必要的时候读一下源代码，就清楚了。hadoop mapreduce 的最基本的文档见http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html