简体   繁体   English

当两行的第一个值相同时,如何将一行的数据添加到另一行

[英]How to add data of a row to another row when two rows have the same first value

I have array like this: 我有这样的数组:

array = [['page', 'pageviews'],
         ['page1', '65'],
         ['page2', '44'],
         ['page1', '40']]

How can I make the script to go through rows and make one row with 'page1' and sum two values '65' and '40' together. 如何使脚本遍历行并使用'page1'进行一行处理,并将两个值'65'和'40'求和。

Using pandas (you say you're open to using this in the comments), this becomes pretty straightforward: 使用熊猫(您说您愿意在评论中使用它),这变得非常简单:

import pandas as pd

df = pd.DataFrame(array[1:], columns=array[0])
df['pageviews'] = pd.to_numeric(df.pageviews)
summed = df.groupby('page').pageviews.sum()

This produces the following Pandas Series: 这将产生以下熊猫系列:

page
page1    105
page2     44

Which you can easily index using the page name: 您可以使用页面名称轻松对其进行索引:

summed['page1']
# 105

At its core, this is a grouping problem. 从本质上讲,这是一个分组问题。 Grouping is easy with a defaultdict : 使用defaultdict可以很容易地进行分组:

from collections import defaultdict

sums = defaultdict(int)
for page, views in array[1:]:
    sums[page] += int(views)

# result: defaultdict(<class 'int'>, {'page1': 105, 'page2': 44})

If you want the result to be in the same format as you input (a list of lists), convert the dict to a list with a list comprehension : 如果您希望结果与输入的格式相同(列表列表),请将dict转换为具有list comprehensionlist comprehension

result = [[page, views] for page, views in sums.items()]
# result: [['page1', 105], ['page2', 44]]

Here's a solution using pandas : 这是使用pandas的解决方案:

import pandas as pd

# read list of lists into pandas dataframe
df = pd.DataFrame(array[1:], columns=array[0])

# convert views from string to integer
df['pageviews'] = df['pageviews'].astype(int)

# group by page, sum pageviews, create list from results
lst = df.groupby('page')['pageviews'].sum()\
        .reset_index().values.tolist()

# add headers
res = [array[0]] + lst

print(res)

[['page', 'pageviews'],
 ['page1', 105],
 ['page2', 44]]

You need to sort it, afterwards you could use itertools.groupby : 您需要对其进行排序,然后可以使用itertools.groupby

from itertools import groupby

array = [ 
    ['page', 'pageviews'],
    ['page1', '65'],
    ['page2', '44'],
    ['page1', '40']
]

# sort it on the first element of each item
array = sorted(array, key = lambda x: x[0])

# keys of interest
keys = ['page1', 'page2']

for k, v in groupby(array, key = lambda x: x[0]):
    if k in keys:
        s = sum([int(x[1]) for x in v])
        print("Key: {}, Sum: {}".format(k, s))

This would yield 这将产生

Key: page1, Sum: 105
Key: page2, Sum: 44

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当第二个csv具有相同的第一行时在csv中添加行 - Add rows in csv when second csv has the same first row 当一行中某一列的值与另一行另一列中的值匹配时,如何匹配pyspark数据框中的两行? - How can I match two rows in a pyspark dataframe when the value in a column in a row matches the value in another column in another row? 如果数组中连续的两行在第一列中具有相同的字符串,则将第一行中的其余条目设置为零 - If two consecutive rows in an array have the same string in the first column set the remaining entries in the first row to zero 如何通过 pandas 数据帧 go 并仅保留在整个行中具有相同值的行? - How to go through a pandas data frame and only keep rows that have the same value throughout the entire row? 如果行在Pandas数据帧中具有匹配的子字符串,则将值从一行添加到另一行 - Add value from one row to another row if rows have matching substrings in Pandas dataframe 如何将第二行的值添加到第一行? - How add value in second row into first row? 如何在另一行中制作具有相同值的行元组列表 - how to make list of tuples of rows with same value in another row 当行中存在与另一行相同的值时为行分配标签 - Assigning labels to rows when there exists some value in a row that is the same as another row Pandas:如果两行或多行在特定列中具有相同的值,则获取其计数并添加到下一行 - Pandas: If two or more rows have same values in particular column then get its count and add to next row 如何在两个数据框中选择具有相同值的行? - How to select the row with same value in between two data frames?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM