[英]How to add data of a row to another row when two rows have the same first value
I have array like this: 我有这样的数组:
array = [['page', 'pageviews'],
['page1', '65'],
['page2', '44'],
['page1', '40']]
How can I make the script to go through rows and make one row with 'page1' and sum two values '65' and '40' together. 如何使脚本遍历行并使用'page1'进行一行处理,并将两个值'65'和'40'求和。
Using pandas (you say you're open to using this in the comments), this becomes pretty straightforward: 使用熊猫(您说您愿意在评论中使用它),这变得非常简单:
import pandas as pd
df = pd.DataFrame(array[1:], columns=array[0])
df['pageviews'] = pd.to_numeric(df.pageviews)
summed = df.groupby('page').pageviews.sum()
This produces the following Pandas Series: 这将产生以下熊猫系列:
page
page1 105
page2 44
Which you can easily index using the page name: 您可以使用页面名称轻松对其进行索引:
summed['page1']
# 105
At its core, this is a grouping problem. 从本质上讲,这是一个分组问题。 Grouping is easy with a
defaultdict
: 使用
defaultdict
可以很容易地进行分组:
from collections import defaultdict
sums = defaultdict(int)
for page, views in array[1:]:
sums[page] += int(views)
# result: defaultdict(<class 'int'>, {'page1': 105, 'page2': 44})
If you want the result to be in the same format as you input (a list of lists), convert the dict to a list with a list comprehension
: 如果您希望结果与输入的格式相同(列表列表),请将dict转换为具有
list comprehension
的list comprehension
:
result = [[page, views] for page, views in sums.items()]
# result: [['page1', 105], ['page2', 44]]
Here's a solution using pandas
: 这是使用
pandas
的解决方案:
import pandas as pd
# read list of lists into pandas dataframe
df = pd.DataFrame(array[1:], columns=array[0])
# convert views from string to integer
df['pageviews'] = df['pageviews'].astype(int)
# group by page, sum pageviews, create list from results
lst = df.groupby('page')['pageviews'].sum()\
.reset_index().values.tolist()
# add headers
res = [array[0]] + lst
print(res)
[['page', 'pageviews'],
['page1', 105],
['page2', 44]]
You need to sort it, afterwards you could use itertools.groupby
: 您需要对其进行排序,然后可以使用
itertools.groupby
:
from itertools import groupby
array = [
['page', 'pageviews'],
['page1', '65'],
['page2', '44'],
['page1', '40']
]
# sort it on the first element of each item
array = sorted(array, key = lambda x: x[0])
# keys of interest
keys = ['page1', 'page2']
for k, v in groupby(array, key = lambda x: x[0]):
if k in keys:
s = sum([int(x[1]) for x in v])
print("Key: {}, Sum: {}".format(k, s))
This would yield 这将产生
Key: page1, Sum: 105
Key: page2, Sum: 44
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.