[英]Calculate average temperature in reducer
我正在嘗試編寫一個代碼來根據 ncdc 天氣計算平均溫度 (reducer.py)。
0057011060999991928010112004+67500+012067FM-12+001199999V0202001N012319999999N0500001N9+00281+99999102171ADDAY181999GF108991999999999999001001MD1710261+9999MW1801
0062011060999991928010206004+67500+012067FM-12+001199999V0201801N00931220001CN0200001N9+00281+99999100901ADDAA199002091AY121999GF101991999999017501999999MD1810461+9999
0108011060999991928010212004+67500+012067FM-12+001199999V0201601N009319999999N0100001N9+00111+99999100062ADDAY171999GF108991999011012501001001MD1810542+9999MW1681EQDQ01+000042SCOTLCQ02+100063APOSLPQ03+000542APC3
0087011060999991928010306004+67500+012067FM-12+001199999V0202001N022619999999N0100001N9+00501+99999098781ADDAA199001091AY161999GF108991999011004501001001MD1310061+9999MW1601EQDQ01+000042SCOTLC
0057011060999991928010312004+67500+012067FM-12+001199999V0202301N01541004501CN0040001N9+00001+99999098951ADDAY161999GF108991081061004501999999MD1210201+9999MW1601
#!/usr/bin/env python
import sys
(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
(key, val) = line.strip().split("\t")
if last_key and last_key != key:
print "%s\t%s" % (last_key, max_val)
(last_key, max_val) = (key, int(val))
else:
(last_key, max_val) = (key, max(max_val, int(val)))
if last_key:
print "%s\t%s" % (last_key, max_val)
首先,您顯示的數據沒有標簽,因此不清楚為什么您顯示了在標簽上拆分行並找到最大值的代碼。 不是平均水平。
要找到平均值,您需要將所有看到的值收集到一個列表中( values.append(int(val))
),然后您可以from statistics import mean
並在循環結束時調用mean(values)
我強烈建議您改用mrjob
或pyspark
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.