简体   繁体   English

用来自另一个数据框中的字符串匹配的平均值列向pandas数据框附加

[英]Append pandas dataframe with column of averaged values from string matches in another dataframe

Right now I have two dataframes (let's call them A and B) built from Excel imports. 现在,我有两个从Excel导入构建的数据框(分别称为A和B)。 Both have different dimensions as well as some empty/NaN cells. 两者都有不同的尺寸以及一些空/ NaN单元。 Let's say A is data for individual model numbers and B is a set of order information. 假设A是单个型号的数据,B是一组订单信息。 For every row (unique item) in A, I want to search B for the (possibly) multiple orders for that item number, average the corresponding prices, and append A with a column containing the average price for each item. 对于A中的每一行(唯一项目),我想在B中搜索该项目编号的(可能)多个订单,平均相应的价格,并在A后面附加一列,其中包含每个项目的平均价格。

The item numbers are alphanumeric so they have to be strings. 项目编号为字母数字,因此必须为字符串。 Not every item will have orders/pricing information for it and I'll be removing those at the next step. 并非每件商品都会有订单/价格信息,我将在下一步中删除它们。 This is a large amount of data so efficiency is ideal so iterrows probably isn't the right choice. 这是大量的数据,因此效率是理想的,因此迭代可能不是正确的选择。 Thank you in advance! 先感谢您!

Here's what I have so far: 这是我到目前为止的内容:

avgPrice = []
for index, row in dfA.iterrows():
    def avg_unit_price(item_no, unit_price):
        matchingOrders = []
        for item, price in zip(item_no, unit_price):
            if item == row['itemNumber']:
                matchingOrders.append(price)
        avgPrice.append(np.mean(matchingOrders))  
    avg_unit_price(dfB['item_no'], dfB['unit_price'])
dfA['avgPrice'] = avgPrice

In general, avoid loops as they perform poorly. 通常,请避免循环,因为它们表现不佳。 If you can't vectorise easily, then as a last resort you can try pd.Series.apply. 如果您不容易向量化,那么作为最后的选择,您可以尝试pd.Series.apply。 In this case, neither were necessary. 在这种情况下,两者都不是必需的。

import pandas as pd

# B: pricing data
df_b = pd.DataFrame([['I1', 34.1], ['I2', 541.31], ['I3', 451.3], ['I2', 644.3], ['I3', 453.2]],
                    columns=['item_no', 'unit_price'])

# create avg price dictionary
item_avg_price = df_b.groupby('item_no', as_index=False).mean().set_index('item_no')['unit_price'].to_dict()

# A: product data
df_a = pd.DataFrame([['I1'], ['I2'], ['I3'], ['I4']], columns=['item_no'])

# map price info to product data
df_a['avgPrice'] = df_a['item_no'].map(item_avg_price)

# remove unmapped items
df_a = df_a[pd.notnull(df_a['avgPrice'])]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Append 如果某个列匹配,则 pandas 行包含来自另一行 dataframe 的数据 - Append a pandas row with data from another dataframe if a certain column matches 熊猫将数据框附加到另一个不合并列值 - Pandas append dataframe to another not merging column values pandas数据框根据另一数据框中的值将值追加到一列 - pandas dataframe append values to one column based on the values in another dataframe 如果数据框中的另一列使用pandas匹配某个值,则从数据框中的列中减去值 - substract values from column in dataframe if another column in dataframe matches some value using pandas 如何从一个数据框中的列中提取特定值并将它们附加到另一个数据框中的列中? - 熊猫 - How do you extract specific values from a column in one dataframe and append them to a column in another dataframe? - Pandas 当来自熊猫中另一个数据框的键匹配时更新数据框的列 - Update column of a dataframe when key matches from another dataframe in pandas Pandas:如果数据框中的值包含来自另一个数据帧的字符串,则追加列 - Pandas : if value in a dataframe contains string from another dataframe, append columns 如果同一 dataframe 中的另一列符合条件,如何从 pandas 中的列获取值? - How to get values from a Column in pandas if another column in same dataframe matches a condition? Pandas dataframe 检查字符串的左侧部分是否与列中的另一个条目匹配 - Pandas dataframe check if left part of a string matches another entry in a column 从 pandas dataframe 的列中查找与另一个字符串列表中的任何项目匹配的字符串 - find a string from column in pandas dataframe which matches any item from another list of strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM