[英]sum two columns, calculate max, min and mean value in MapReduce
I have a sample code of mapper as the following shows, the key is UCO, the value is TaxiTotal, which should be the sum of two columns, TaxiIn and TaxiOut, how to sum the two columns?我有一个mapper的示例代码如下所示,key是UCO,value是TaxiTotal,应该是TaxiIn和TaxiOut两列的总和,如何对两列求和?
my current solution TaxiIn + TaxiOut result in a paste number, like 333+444 = 333444, I need it to be 777, how to write the code?我目前的解决方案TaxiIn + TaxiOut结果是一个粘贴数字,比如333+444 = 333444,我需要它是777,代码怎么写?
#! /usr/bin/env python
import sys
# -- Airline Data
# Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum,
# TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,
# TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay
for line in sys.stdin:
line = line.strip()
unpacked = line.split(",")
Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, Origin, Dest, Distance, TaxiIn,TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay = line.split(",")
UCO = "-".join([UniqueCarrier, Origin])
results = [UCO, TaxiIn+TaxiOut]
print("\t".join(results))
Convert TaxiIn + TaxiOut
to:将
TaxiIn + TaxiOut
转换为:
int(TaxiIn) + int(TaxiOut)
See below example:请参阅以下示例:
In [1612]: TaxiIn = '333'
In [1613]: TaxiOut = '444'
In [1614]: TaxiIn + TaxiOut
Out[1614]: '333444'
In [1615]: int(TaxiIn) + int(TaxiOut)
Out[1615]: 777
You can't have numerical sums of string, for that convert str
to int
or float
.你不能有字符串的数字总和,因为将
str
转换为int
或float
。
results = [UCO, str(int(TaxiIn) + int(TaxiOut))]
print("\t".join(results))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.