Hadoop MapReduce核心原理与实战

Hadoop MapReduce 编程模型概述

Hadoop MapReduce 是一种分布式计算框架,用于处理大规模数据集的并行计算。其核心思想是将计算任务分解为两个阶段:Map 和 Reduce。Map 阶段负责数据的分发与初步处理,Reduce 阶段对 Map 的输出进行汇总和最终计算。

MapReduce 的核心组件

  • JobTracker:负责调度和管理作业,分配任务到 TaskTracker。
  • TaskTracker:执行具体的 Map 或 Reduce 任务,并向 JobTracker 汇报状态。
  • InputFormat:定义输入数据的格式及如何分割数据。
  • OutputFormat:定义输出数据的存储格式。

MapReduce 的工作流程

  1. 输入分片(Input Splits)
    输入数据被划分为多个分片,每个分片由一个 Map 任务处理。分片大小通常与 HDFS 块大小一致(默认为 128MB)。

  2. Map 阶段
    每个 Map 任务处理一个输入分片,生成键值对(key-value pairs)作为中间结果。例如,在词频统计任务中,Map 的输出格式为 <word, 1>

  3. Shuffle 和 Sort
    Map 的输出经过分区(Partitioning)、排序(Sorting)和合并(Combining),确保相同 key 的数据发送到同一个 Reduce 任务。

  4. Reduce 阶段
    Reduce 任务对相同 key 的 value 列表进行聚合处理。例如,词频统计中 Reduce 的输出为 <word, total_count>

编写 MapReduce 程序示例

以下是一个简单的词频统计(WordCount)程序:

Mapper 类

public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {  
    private final static IntWritable one = new IntWritable(1);  
    private Text word = new Text();  

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {  
        String[] words = value.toString().split(" ");  
        for (String w : words) {  
            word.set(w);  
            context.write(word, one);  
        }  
    }  
}  

Reducer 类

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {  
    private IntWritable result = new IntWritable();  

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {  
        int sum = 0;  
        for (IntWritable val : values) {  
            sum += val.get();  
        }  
        result.set(sum);  
        context.write(key, result);  
    }  
}  

Driver 类

public class WordCountDriver {  
    public static void main(String[] args) throws Exception {  
        Configuration conf = new Configuration();  
        Job job = Job.getInstance(conf, "word count");  

        job.setJarByClass(WordCountDriver.class);  
        job.setMapperClass(WordCountMapper.class);  
        job.setReducerClass(WordCountReducer.class);  

        job.setOutputKeyClass(Text.class);  
        job.setOutputValueClass(IntWritable.class);  

        FileInputFormat.addInputPath(job, new Path(args[0]));  
        FileOutputFormat.setOutputPath(job, new Path(args[1]));  

        System.exit(job.waitForCompletion(true) ? 0 : 1);  
    }  
}  

MapReduce 的优化策略

  1. Combiner 的使用
    在 Map 端本地聚合数据,减少网络传输开销。例如,在 WordCount 中可直接复用 Reducer 逻辑作为 Combiner。

  2. 合理设置 Reduce 任务数
    避免数据倾斜,通过 job.setNumReduceTasks(int) 调整 Reduce 任务数量。

  3. 自定义分区器(Partitioner)
    确保数据均匀分布到 Reduce 任务,避免某些 Reduce 任务过载。

MapReduce 的局限性

  • 不适合迭代计算或实时处理场景。
  • 中间结果需写入磁盘,性能较低。
  • 复杂的 DAG(有向无环图)任务需多次 MapReduce 作业串联。

替代方案

对于更复杂的计算需求,可考虑 Apache Spark、Flink 等新一代计算框架,它们支持内存计算和更灵活的编程模型。

BbS.okacop092.info/PoSt/1120_071165.HtM
BbS.okacop093.info/PoSt/1120_666833.HtM
BbS.okacop094.info/PoSt/1120_915986.HtM
BbS.okacop095.info/PoSt/1120_642337.HtM
BbS.okacop096.info/PoSt/1120_345070.HtM
BbS.okacop097.info/PoSt/1120_901144.HtM
BbS.okacop098.info/PoSt/1120_691269.HtM
BbS.okacop099.info/PoSt/1120_476700.HtM
BbS.okacop114.info/PoSt/1120_839703.HtM
BbS.okacop829.info/PoSt/1120_454507.HtM
BbS.okacop092.info/PoSt/1120_194431.HtM
BbS.okacop093.info/PoSt/1120_167408.HtM
BbS.okacop094.info/PoSt/1120_445969.HtM
BbS.okacop095.info/PoSt/1120_838822.HtM
BbS.okacop096.info/PoSt/1120_028392.HtM
BbS.okacop097.info/PoSt/1120_293319.HtM
BbS.okacop098.info/PoSt/1120_209022.HtM
BbS.okacop099.info/PoSt/1120_604905.HtM
BbS.okacop114.info/PoSt/1120_235840.HtM
BbS.okacop829.info/PoSt/1120_080317.HtM
BbS.okacop092.info/PoSt/1120_409746.HtM
BbS.okacop093.info/PoSt/1120_434774.HtM
BbS.okacop094.info/PoSt/1120_325783.HtM
BbS.okacop095.info/PoSt/1120_222495.HtM
BbS.okacop096.info/PoSt/1120_083934.HtM
BbS.okacop097.info/PoSt/1120_501487.HtM
BbS.okacop098.info/PoSt/1120_334974.HtM
BbS.okacop099.info/PoSt/1120_203749.HtM
BbS.okacop114.info/PoSt/1120_885627.HtM
BbS.okacop829.info/PoSt/1120_833768.HtM
BbS.okacop092.info/PoSt/1120_309020.HtM
BbS.okacop093.info/PoSt/1120_853805.HtM
BbS.okacop094.info/PoSt/1120_084447.HtM
BbS.okacop095.info/PoSt/1120_072028.HtM
BbS.okacop096.info/PoSt/1120_837872.HtM
BbS.okacop097.info/PoSt/1120_932901.HtM
BbS.okacop098.info/PoSt/1120_774254.HtM
BbS.okacop099.info/PoSt/1120_001892.HtM
BbS.okacop114.info/PoSt/1120_075324.HtM
BbS.okacop829.info/PoSt/1120_477512.HtM
BbS.okacop092.info/PoSt/1120_806718.HtM
BbS.okacop093.info/PoSt/1120_556100.HtM
BbS.okacop094.info/PoSt/1120_017769.HtM
BbS.okacop095.info/PoSt/1120_482945.HtM
BbS.okacop096.info/PoSt/1120_835966.HtM
BbS.okacop097.info/PoSt/1120_953236.HtM
BbS.okacop098.info/PoSt/1120_826872.HtM
BbS.okacop099.info/PoSt/1120_661263.HtM
BbS.okacop114.info/PoSt/1120_804466.HtM
BbS.okacop829.info/PoSt/1120_155345.HtM
BbS.okacop000.info/PoSt/1120_333188.HtM
BbS.okacop001.info/PoSt/1120_531258.HtM
BbS.okacop002.info/PoSt/1120_096104.HtM
BbS.okacop003.info/PoSt/1120_572481.HtM
BbS.okacop004.info/PoSt/1120_181380.HtM
BbS.okacop005.info/PoSt/1120_044952.HtM
BbS.okacop006.info/PoSt/1120_918011.HtM
BbS.okacop007.info/PoSt/1120_931380.HtM
BbS.okacop008.info/PoSt/1120_305926.HtM
BbS.okacop009.info/PoSt/1120_671819.HtM
BbS.okacop000.info/PoSt/1120_739992.HtM
BbS.okacop001.info/PoSt/1120_665828.HtM
BbS.okacop002.info/PoSt/1120_437133.HtM
BbS.okacop003.info/PoSt/1120_869399.HtM
BbS.okacop004.info/PoSt/1120_470909.HtM
BbS.okacop005.info/PoSt/1120_136643.HtM
BbS.okacop006.info/PoSt/1120_555916.HtM
BbS.okacop007.info/PoSt/1120_674326.HtM
BbS.okacop008.info/PoSt/1120_548734.HtM
BbS.okacop009.info/PoSt/1120_359282.HtM
BbS.okacop000.info/PoSt/1120_947666.HtM
BbS.okacop001.info/PoSt/1120_661124.HtM
BbS.okacop002.info/PoSt/1120_989163.HtM
BbS.okacop003.info/PoSt/1120_619464.HtM
BbS.okacop004.info/PoSt/1120_268871.HtM
BbS.okacop005.info/PoSt/1120_410626.HtM
BbS.okacop006.info/PoSt/1120_345771.HtM
BbS.okacop007.info/PoSt/1120_671956.HtM
BbS.okacop008.info/PoSt/1120_420217.HtM
BbS.okacop009.info/PoSt/1120_654156.HtM

#牛客AI配图神器#

全部评论

相关推荐

评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客网在线编程
牛客网题解
牛客企业服务