当两行的第一个值相同时，如何将一行的数据添加到另一行

Question

I have array like this: 我有这样的数组：

array = [['page', 'pageviews'],
         ['page1', '65'],
         ['page2', '44'],
         ['page1', '40']]

How can I make the script to go through rows and make one row with 'page1' and sum two values '65' and '40' together. 如何使脚本遍历行并使用'page1'进行一行处理，并将两个值'65'和'40'求和。

Answer 1

Using pandas (you say you're open to using this in the comments), this becomes pretty straightforward: 使用熊猫（您说您愿意在评论中使用它），这变得非常简单：

import pandas as pd

df = pd.DataFrame(array[1:], columns=array[0])
df['pageviews'] = pd.to_numeric(df.pageviews)
summed = df.groupby('page').pageviews.sum()

This produces the following Pandas Series: 这将产生以下熊猫系列：

page
page1    105
page2     44

Which you can easily index using the page name: 您可以使用页面名称轻松对其进行索引：

summed['page1']
# 105

Answer 2

At its core, this is a grouping problem. 从本质上讲，这是一个分组问题。 Grouping is easy with a defaultdict : 使用defaultdict可以很容易地进行分组：

from collections import defaultdict

sums = defaultdict(int)
for page, views in array[1:]:
    sums[page] += int(views)

# result: defaultdict(<class 'int'>, {'page1': 105, 'page2': 44})

If you want the result to be in the same format as you input (a list of lists), convert the dict to a list with a list comprehension : 如果您希望结果与输入的格式相同（列表列表），请将dict转换为具有list comprehension的list comprehension ：

result = [[page, views] for page, views in sums.items()]
# result: [['page1', 105], ['page2', 44]]

Answer 3

Here's a solution using pandas : 这是使用pandas的解决方案：

import pandas as pd

# read list of lists into pandas dataframe
df = pd.DataFrame(array[1:], columns=array[0])

# convert views from string to integer
df['pageviews'] = df['pageviews'].astype(int)

# group by page, sum pageviews, create list from results
lst = df.groupby('page')['pageviews'].sum()\
        .reset_index().values.tolist()

# add headers
res = [array[0]] + lst

print(res)

[['page', 'pageviews'],
 ['page1', 105],
 ['page2', 44]]

Answer 4

You need to sort it, afterwards you could use itertools.groupby : 您需要对其进行排序，然后可以使用itertools.groupby ：

from itertools import groupby

array = [ 
    ['page', 'pageviews'],
    ['page1', '65'],
    ['page2', '44'],
    ['page1', '40']
]

# sort it on the first element of each item
array = sorted(array, key = lambda x: x[0])

# keys of interest
keys = ['page1', 'page2']

for k, v in groupby(array, key = lambda x: x[0]):
    if k in keys:
        s = sum([int(x[1]) for x in v])
        print("Key: {}, Sum: {}".format(k, s))

This would yield 这将产生

Key: page1, Sum: 105
Key: page2, Sum: 44

当两行的第一个值相同时，如何将一行的数据添加到另一行

问题描述

4 个解决方案

解决方案1
3 2018-06-02 13:36:04

解决方案2
1 2018-06-02 13:39:44

解决方案3
1 已采纳 2018-06-02 13:43:19

解决方案4
0 2018-06-02 13:34:18

当两行的第一个值相同时，如何将一行的数据添加到另一行

问题描述

4 个解决方案

解决方案1 3 2018-06-02 13:36:04

解决方案2 1 2018-06-02 13:39:44

解决方案3 1 已采纳 2018-06-02 13:43:19

解决方案4 0 2018-06-02 13:34:18

解决方案1
3 2018-06-02 13:36:04

解决方案2
1 2018-06-02 13:39:44

解决方案3
1 已采纳 2018-06-02 13:43:19

解决方案4
0 2018-06-02 13:34:18