简体   繁体   English

如何向 dataframe (df1) 添加一个新列,这是另一个 dataframe (df2) 中 df1 的多个查找值的总和

[英]How can I add a new column to a dataframe (df1) that is the sum of multiple lookup values from df1 in another dataframe (df2)

Say I have 2 dataframes:假设我有 2 个数据框:

df1 df1

     id       guid               name      item1        item2        item3        item4        item5         item6       item7        item8       item9
0  3031958124  85558-261955282  Alonso  85558-57439  85558-54608  85558-91361  85558-40647  85558-41305  85558-79979  85558-33076  85558-89956  85558-12554
1  3031958127  85558-261955282  Jeff    85558-57439  85558-39280  85558-91361  85558-55987  85558-83083  85558-79979  85558-33076  85558-41872  85558-12554
2  3031958129  85558-261955282  Mike    85558-57439  85558-39280  85558-91361  85558-55987  85558-40647  85558-79979  85558-33076  85558-88297  85558-12534
...

df2 where item_lookup is the index df2 其中item_lookup是索引

             item_type   cost  value  target 
item_lookup
85558-57439  item1       9500   25.1   1.9
85558-54608  item2       8000   18.7   0.0 
85558-91361  item3       7000   16.5   0.9
...

I want to add the sum of cost , value , and target for each item1 through item9 using item_lookup ( df2 ) and store that as a column on df1.我想使用 item_lookup ( df2 ) 为每个 item1 到 item9 添加costvaluetarget的总和,并将其存储为 df1 上的列。

So the result should look like: df1所以结果应该是这样的:df1

     id       guid               name      item1        item2        item3        item4        item5         item6       item7        item8       item9       cost   value  target
0  3031958124  85558-261955282  Alonso  85558-57439  85558-54608  85558-91361  85558-40647  85558-41305  85558-79979  85558-33076  85558-89956  85558-12554  58000   192.5   38.3
1  3031958127  85558-261955282  Jeff    85558-57439  85558-39280  85558-91361  85558-55987  85558-83083  85558-79979  85558-33076  85558-41872  85558-12554  59400   183.2   87.7
2  3031958129  85558-261955282  Mike    85558-57439  85558-39280  85558-91361  85558-55987  85558-40647  85558-79979  85558-33076  85558-88297  85558-12534  58000   101.5   18.1
...

I've tried following similar solutions online that use .map , however these examples are only for single columns whereas I am trying to sum values for 9 columns.我尝试过使用.map在线遵循类似的解决方案,但是这些示例仅适用于单列,而我试图对 9 列的值求和。

You can do this by using df.apply , basically looping through the rows and then looping through the items in the row and calculating the sum您可以通过使用df.apply来做到这一点,基本上循环遍历行,然后循环遍历行中的项目并计算总和


Since i couldn't use your dfs because they are incomplete, i made mine.因为我不能使用你的 dfs 因为它们不完整,所以我做了我的。

given df1 :给定df1

  item1 item2 item3
0     b     e     j
1     d     a     d
2     j     b     a
3     c     j     f
4     e     f     c
5     a     d     b
6     f     c     e

and df2df2

             cost  value  target
item_lookup                     
a              19     20      12
b              16     14      14
c              20     18      18
d              17     12      14
e              20     15      17
f              19     20      12
j              11     17      12

you can use the following function to get what you need您可以使用以下 function 来获得您需要的东西

def add_items(row):
     row["cost"] = row["target"] = row["value"] = 0
     # get the columns that have item in the name
     cols = [col for col in df1.columns if "item" in col]
     # get each of the columns look it up in df2 and add it to our new cols
     for col in cols:
         item_lookup = row[col]
         lookup_result = df2.loc[item_lookup]
         row["cost"] += lookup_result["cost"]
         row["target"] += lookup_result["target"]
         row["value"] += lookup_result["value"]
     return row

and then apply it然后应用它

>>> df1.apply(add_items, axis=1)
  item1 item2 item3  cost  target  value
0     b     e     j    47      43     46
1     d     a     d    53      40     44
2     j     b     a    46      38     51
3     c     j     f    50      42     55
4     e     f     c    59      47     53
5     a     d     b    52      40     46
6     f     c     e    59      47     53

I got a simpler solution here.我在这里得到了一个更简单的解决方案。 First, save the item_lookup values that correspond with cost, target and value to a dictionary.首先,将 cost、target 和 value 对应的item_lookup值保存到字典中。 and then use .map() and .sum() to create columns:然后使用.map().sum()创建列:

df2.reset_index(drop=False, inplace=True)

map_cost = dict(zip(df2['item_lookup'], df2['cost']))
map_value = dict(zip(df2['item_lookup'], df2['value']))
map_target = dict(zip(df2['item_lookup'], df2['target']))


df1['cost'] = df1.apply(lambda x: x.map(map_cost)).sum(axis=1)
df1['value'] = df1.apply(lambda x: x.map(map_value)).sum(axis=1)
df1['target'] = df1.apply(lambda x: x.map(map_target)).sum(axis=1)


df1

Output: Output:

           id             guid    name        item1        item2        item3     cost  value  target
0  3031958124  85558-261955282  Alonso  85558-57439  85558-54608  85558-91361  24500.0   60.3     2.8
1  3031958127  85558-261955282    Jeff  85558-57439  85558-39280  85558-91361  16500.0   41.6     2.8
2  3031958129  85558-261955282    Mike  85558-57439  85558-39280  85558-91361  16500.0   41.6     2.8

I prefer the following solution that has elements of the solutions proposed by @ali bakhtiari and @zaki98, but is more explicit, performant, and flexible.我更喜欢以下解决方案,它具有@ali bakhtiari 和@zaki98 提出的解决方案的元素,但更加明确、高效和灵活。 Use applymap as the item lookup is the same for all item cols, assuming item_lookup in df2 uniquely identifies each row (all solutions assume this);使用applymap因为项目查找对于所有项目列都是相同的,假设df2中的item_lookup唯一标识每一行(所有解决方案都假设这一点); however, the solution I propose also handles the case of an item_lookup in df1 that is not present in df2 .但是,我提出的解决方案还处理df1中不存在的item_lookup的情况df2 For df1 and `df2 as follows,对于df1和 `df2 如下,

df1

DF2

define columns to sum, sum_cols , and item columns in df1 , item_cols , then append each summed column to df1 as follows:将列定义为 sum、 sum_colsdf1中的 item 列、 item_cols ,然后 append 将每个求和列定义为df1 ,如下所示:

sum_cols = ['cost', 'value', 'target']
item_cols = [col for col in df1.columns if 'item' in col]

df2.set_index('item_lookup', inplace=True)
for sc in sum_cols:
    df1[sc] = df1[item_cols] \
                    .applymap(lambda x: df2.at[x, sc] if x in df2.index else 0) \
                    .sum(axis=1)

This problem also seems like a good use case for the more performant at over loc as only a single, numeric value is looked up each time (see this SO post ).这个问题似乎也是 over loc性能at的一个很好的用例,因为每次只查找一个数值(参见这篇SO 帖子)。 It's not necessary to set item_lookup as the index on df2 but doing so should improve performance on large datasets.没有必要将item_lookup设置为df2上的索引,但这样做应该会提高大型数据集的性能。 If an item_lookup in df1 is not present in df2 , you could also replace with NaN , eg so that the number of missing item_lookup values could be counted, with minimal additional effort.如果df1中的item_lookup不存在于df2中,您也可以替换为NaN ,例如,这样可以计算缺少的item_lookup值的数量,而额外的工作量最少。

Output df1 : Output df1 :

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当 df1 中的键列与 df2 中的多个列匹配时,使用另一个数据框 (df2) 列中的值更新数据框 (df1) 列 - Update a dataframe(df1) column with value from another dataframe(df2) column when a key column in df1 matches to multiple columns in df2 如何用来自另一个 dataframe (df2) 的信息填充 dataframe (df1) 的列? 就在 df1 和 df2 中的两列信息匹配时? - How to fill a column of a dataframe (df1) with info from another dataframe (df2)? Just when two column info matches in df1 and df2? 在DF2列值与DF1索引匹配的pandas DataFrame1中设置新的列值 - Set new column values in pandas DataFrame1 where DF2 column values match DF1 index 根据df1中的3个值与df2中的3个值匹配,在数据框中填充新列 - Filling new column in a dataframe based on 3 values in df1 matching 3 values in df2 pandas 如何从 df2 获取 df1 的值,而 df1 和 df2 的值在列上重叠 - pandas how to get values from df2 for df1 while df1 and df2 have values overlapped on column(s) Pandas Dataframe:df 从另一个 df1 dataframe 添加列 - Pandas Dataframe: df adding a column from another df1 dataframe 如果 df2 中不存在列,如何将列从 df1 添加到 df2,否则什么也不做 - How to add a column from df1 to df2 if it not present in df2, else do nothing 如何在熊猫中进行“(df1&not df2)”数据框合并? - How to do "(df1 & not df2)" dataframe merge in pandas? DataFrame,如果特定列的值在DF1中,则将DF1中的值添加到DF2中的特定行中 - DataFrame, adding value from DF1 in specific row in DF2 if specific columns value is in DF1 如果 df1 column1 中的值与列表中的值匹配,Pandas 从另一个 df1 column2 在 df2 中创建新列 - Pandas create new column in df2 from another df1 column2 if a value in df1 column1 matches value in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM