简体   繁体   English

求和两列,计算 MapReduce 中的最大值、最小值和平均值

[英]sum two columns, calculate max, min and mean value in MapReduce

I have a sample code of mapper as the following shows, the key is UCO, the value is TaxiTotal, which should be the sum of two columns, TaxiIn and TaxiOut, how to sum the two columns?我有一个mapper的示例代码如下所示,key是UCO,value是TaxiTotal,应该是TaxiIn和TaxiOut两列的总和,如何对两列求和?

my current solution TaxiIn + TaxiOut result in a paste number, like 333+444 = 333444, I need it to be 777, how to write the code?我目前的解决方案TaxiIn + TaxiOut结果是一个粘贴数字,比如333+444 = 333444,我需要它是777,代码怎么写?

#! /usr/bin/env python

import sys

# -- Airline Data
# Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum,
# TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest,         Distance, TaxiIn,
# TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay

for line in sys.stdin:
    line = line.strip()
    unpacked = line.split(",")
    Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay = line.split(",")
    UCO = "-".join([UniqueCarrier, Origin])
    results = [UCO, TaxiIn+TaxiOut]
    print("\t".join(results))

Convert TaxiIn + TaxiOut to:TaxiIn + TaxiOut转换为:

int(TaxiIn) + int(TaxiOut)

See below example:请参阅以下示例:

In [1612]: TaxiIn = '333'                                                                                                                                                                                   

In [1613]: TaxiOut = '444'                                                                                                                                                                                  

In [1614]: TaxiIn + TaxiOut                                                                                                                                                                                 
Out[1614]: '333444'

In [1615]: int(TaxiIn) + int(TaxiOut)                                                                                                                                                                       
Out[1615]: 777

You can't have numerical sums of string, for that convert str to int or float .你不能有字符串的数字总和,因为将str转换为intfloat

your code should be:你的代码应该是:

results = [UCO, str(int(TaxiIn) + int(TaxiOut))]
print("\t".join(results))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM