简体   繁体   English

使用 Pandas,如何从一个系列中的多个索引匹配,匹配到 DataFrame 并替换多个列

[英]Using Pandas, how can you match from multiple indexes in a series, match to a DataFrame and replace multiple columns

I am trying to match a combination of values in one data frame, to the same combination in another (essentially a lookup table).我正在尝试将一个数据框中的值组合与另一个数据框中的相同组合(本质上是一个查找表)进行匹配。 If I find a match in the lookup table, replace the values in the original from the lookup.如果我在查找表中找到匹配项,请从查找中替换原始值。 I have tried using replace, map, using loc, but I think I am confusing myself more.我曾尝试使用替换,map,使用 loc,但我想我更困惑自己。

I have a example dataframe,我有一个例子 dataframe,

example1 = {
    'Code': ['99233','99233','99233','90732','93306','93306','93306'],
    'Modifier': ['','','','','','TC','26'],
    'W': ['0','0','0','0','0','0','0'],
    'P': ['0','0','0','0','0','0','0'],
    'M': ['0','0','0','0','0','0','0']
}
df1 = pd.DataFrame(example1)

Which looks like this,看起来像这样,

    Code    Modifier    W   P   M
0   99233               0   0   0
1   99233               0   0   0
2   99233               0   0   0
3   90732               0   0   0
4   93306               0   0   0
5   93306   TC          0   0   0
6   93306   26          0   0   0

I would then use a lookup table like the following...然后我会使用如下查找表...

example2 = {
    'Code': ['99233','90732','93306','93306','93306'],
    'Modifier': ['','','','TC','26'],
    'W': ['2','0','1.5','0','1.5'],
    'P': ['0.81','0','4.29','3.76','0.53'],
    'M': ['0.13','0','0.7','0.2','0.05']
}
df2 = pd.DataFrame(example2)

Which appears like so,看起来像这样,

    Code    Modifier    W   P       M
0   99233               2   0.81    0.13
1   90732               0   0       0
2   93306               1.5 4.29    0.7
3   93306   TC          0   3.76    0.2
4   93306   26          1.5 0.53    0.05

I want to be able to use the "Code" and "Modifier" fields and replace the values for W, P, and M in the main dataframe (df1).我希望能够使用“代码”和“修改器”字段并替换主 dataframe (df1) 中的 W、P 和 M 的值。

I was able to match on one value by converting the lookup table to a series (I'm not sure if this is correct but it made sense) and using the code in the dictionary as my index我能够通过将查找表转换为系列来匹配一个值(我不确定这是否正确,但它是有道理的)并使用字典中的代码作为我的索引

vdic = pd.Series(df2.W.values, index=df2.Code).to_dict()
df1.loc[df1.Code.isin(vdic.keys()), 'W'] = df1.loc[(df1.Code.isin(vdic.keys())), 'Code'].map(vdic)
df1

This gets me half-way there with the first column but obviously not picking up on the modifier.这让我在第一列中走到了一半,但显然没有接受修饰符。

    Code    Modifier    W   P   M
0   99233               2   0   0
1   99233               2   0   0
2   99233               2   0   0
3   90732               0   0   0
4   93306               1.5 0   0
5   93306   TC          1.5 0   0
6   93306   26          1.5 0   0

I tried adding a second index to the dictionary,我尝试在字典中添加第二个索引,

vdic = pd.Series(df2.W.values, index=[df2.Code, df2.Modifier]).to_dict()

{('99233', ''): '2',
 ('90732', ''): '0',
 ('93306', ''): '1.5',
 ('93306', 'TC'): '0',
 ('93306', '26'): '1.5'}

I think this would work but I have to be making this more complicated than it actually is and every attempt so far is not working.我认为这会奏效,但我必须让它变得比实际更复杂,而且到目前为止的每一次尝试都没有奏效。 I checked other threads and the code is all over the place.我检查了其他线程,代码到处都是。

Any help or suggestions would be greatly appreciated.任何帮助或建议将不胜感激。

Also curious if I can update all three columns (W, P, and M) in one pass or should this be subdivided?也很好奇我是否可以一次更新所有三列(W、P 和 M),还是应该细分?

Edit from the first answer by @user13802115 (which was awesome BTW)从@user13802115 的第一个答案中编辑(顺便说一句,这很棒)

I should amend the question and ask if it possible to do the same operation when the data frames are of different sizes.我应该修改问题并询问当数据帧大小不同时是否可以执行相同的操作。

example3 = {
    'Other1': ['1','7','4','54','9','43','22'],
    'Other2': ['A','Z','Y','BB','7W','9','Left'],
    'Code': ['99233','99233','99233','90732','93306','93306','93306'],
    'Modifier': ['','','','','','TC','26'],
    'W': ['0','0','0','0','0','0','0'],
    'P': ['0','0','0','0','0','0','0'],
    'M': ['0','0','0','0','0','0','0']
}
df3 = pd.DataFrame(example3)

Essentially edit in place and only update the values from the lookup table in the first data frame leaving the other however many items untouched.基本上就地编辑,只更新第一个数据框中查找表中的值,而其他许多项目保持不变。

Solution Below解决方案如下

Thanks to the answer by @user13802115, I used the following link: Pandas merging on different size dataframes based on one column感谢@user13802115 的回答,我使用了以下链接: Pandas merging on different size dataframes based on one column

to get what I needed.得到我需要的东西。 Using the amended dataframe (df3) I can run the following to merge my data, drop the appended values to my initial dataframe, and reindex so everything remains as originally created, with updated fields.使用修改后的 dataframe (df3),我可以运行以下命令来合并我的数据,将附加值放到我的初始 dataframe 中,然后重新索引,以便一切都保持最初创建的状态,并使用更新的字段。

df = (df3.merge(df2, on=['Code','Modifier'], how='left', suffixes=('_',''))
        .drop(['W_','P_','M_'], axis=1)
        .reindex(columns=df1.columns))
df

I am not exactly sure, but I believe this is what you want.我不确定,但我相信这就是你想要的。

df3 = pd.merge(df1[['Code','Modifier']],df2,on = ['Code','Modifier'],how = 'left').fillna('0')

I think this is what you are trying to do:我认为这就是你想要做的:

for code_ind, code in enumerate(df1.Code.unique()):
    modifiers = df1.loc[df1['Code']==code].Modifier.unique()
    for mod_ind, modifier in enumerate(modifiers):
        row_to_modify = df1.loc[(df1['Code']==code) &(df1['Modifier']==modifier)].iloc[0].name
        lookup_row = df2.loc[(df2['Code']==code) & (df2['Modifier']==modifier),['W','P','M']].iloc[0].name
        df1.loc[df1.index[row_to_modify],['W','P','M']] =  df2.loc[df2.index[lookup_row],['W','P','M']]

This only modifies the first occurrence of each code in the base table with the first index of each code in the lookup table.这仅使用查找表中每个代码的第一个索引来修改基表中每个代码的第一次出现。 However, it does not append values to the base table that do not already exist, I wasnt sure if you wanted that or not.但是,它不会将 append 值添加到不存在的基表中,我不确定您是否想要。

Here is an example output dataframe using the dictionaries you provided:这是使用您提供的字典的示例 output dataframe :

    Code Modifier    W     P     M
0  99233             2  0.81  0.13
1  99233             0     0     0
2  99233             0     0     0
3  90732             0     0     0
4  93306           1.5  4.29   0.7
5  93306       TC    0  3.76   0.2
6  93306       26  1.5  0.53  0.05

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何匹配pandas DataFrame中的多个列为“间隔”? - How to match multiple columns in pandas DataFrame for an “interval”? 使用多个索引从Pandas数据框中删除列 - Delete columns from Pandas dataframe with multiple indexes 使用系列索引作为列的熊猫系列到数据框 - pandas Series to Dataframe using Series indexes as columns 使用dict的值过滤多列上的pandas数据框以实现部分字符串匹配 - Filter a pandas dataframe on multiple columns for partial string match, using values from a dict 如何在索引不匹配时将 dataframe 中的 2 列添加到另一列 - how to Add 2 columns from a dataframe to another while indexes do Not match 如何用多列替换Pandas数据帧中的单元格? - How to replace cells in a Pandas dataframe with multiple columns? pandas 匹配/比较多列 - pandas match/compare multiple columns Pandas 将列值匹配到相同的多个列 Dataframe - Pandas Match Column Value to Multiple Columns in Same Dataframe 用多个条件从熊猫数据框中匹配,替换和提取子字符串的最快方法是什么? - What is the fastest way to match, replace, and extract substrings from pandas dataframe with multiple criteria? Pandas 使用 Dataframe 中的日期时间列与多个日期时间列进行最接近的匹配 - Pandas Using a Datetime column in a Dataframe to have the nearest match comparing with multiple Datetime columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM