简体   繁体   English

Hadoop Map中最高/最低温度的python mapreduce示例

[英]python mapreduce example for max/min temperature in hadoop

I have setup the hadoop on my ubuntu, and ran example codes to test. 我在ubuntu上设置了hadoop,并运行了示例代码进行测试。 One of the common examples is https://github.com/tomwhite/hadoop-book/tree/master/ch02/src/main/python 常见示例之一是https://github.com/tomwhite/hadoop-book/tree/master/ch02/src/main/python

I have tested this code w/ given sample file( https://github.com/tomwhite/hadoop-book/blob/master/input/ncdc/sample.txt ). 我已经使用给定的示例文件( https://github.com/tomwhite/hadoop-book/blob/master/input/ncdc/sample.txt )测试了此代码。 However, when I modified the mapper code acording to my data file, reducer goes from 0% to 33% and then back to 0%. 但是,当我根据数据文件修改映射器代码时,reducer从0%变为33%,然后又回到0%。 Can anyone help on why that happens or how should I modify the code. 任何人都可以帮忙解决为什么发生这种情况或我应该如何修改代码。 My data looks like: 我的数据如下:

STN---,WBAN , YEARMODA,   TEMP,  ,   DEWP,  ,  SLP  ,  ,  STP  ,  , VISIB,  ,  WDSP,  , MXSPD,  GUST,   MAX  ,  MIN  ,PRCP  ,SNDP , FRSHTT,


690190,13910, 20120101,   42.9,18,   29.4,18, 1033.3,18,  968.7,18,  10.0,18,   8.7,18,  15.0, 999.9,   52.5*,  31.6*, 0.00I,999.9, 000000,

If you check the job tracker, i'm sure that the map task is failing and being rescheduled to run on another node (eventually the job fails). 如果您检查作业跟踪器,则我确定映射任务失败并且被重新安排为在另一个节点上运行(最终该作业失败)。 This is probably due to the python script throwing an error so i would recommend (if you haven't already done this) to pipe your sample data through your mapper to see what it yields. 这可能是由于python脚本抛出错误,所以我建议(如果您尚未执行此操作)将示例数据通过映射器进行管道传输,以查看产生的结果。

For example i took your data and ran it through the linked python mapper (with an additional println to see the extracted columns: 例如,我获取了您的数据,并通过链接的python映射器(通过附加的println来查看提取的列):

#> cat data.csv | python map.py
EARM  MXSP D


0120   15. 0
0120      15.

Obviously your mapper has been amended as you note in your question - so you need to make sure the python script processes your sample data without error. 显然,正如您在问题中所指出的那样,您的映射器已被修改-因此您需要确保python脚本能够正确处理示例数据。 If it runs without error then you need to check the logs for the failed map tasks (post them into your question) 如果运行无误,则需要检查日志以查找失败的地图任务(将其发布到您的问题中)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM