简体   繁体   English

熊猫替换除一之外的列值

[英]pandas replace column values except one

Original dataframe:原始数据框:

    DocID   DocURL                       DocName    SiteURL LibraryURL
0   29806   path/to/doc/docname1.doc    docname1    web/url lib/url
1   29807   path/to/doc/docname2.doc    docname2    web/url lib/url

New dataframe:新数据框:

    DocURL                   DocName    SiteURL LibraryURL
0   path/to/doc/newname.doc  newname    web/url lib/url

I want to replace the row with DocID == 29806 with this new row.我想用这个新行替换 DocID == 29806 的行。

I have tried doing it by using following code without success:我曾尝试使用以下代码进行操作,但没有成功:

df.loc[:, df.columns != 'DocID'].loc[row_index] = new_df.iloc[0]

And this:和这个:

df.loc[row_index][1:] = new_df.iloc[0]

For the first one I don't get any error or warning, for the next one I get:对于第一个我没有收到任何错误或警告,对于下一个我得到:

A value is trying to be set on a copy of a slice from a DataFrame试图在来自 DataFrame 的切片副本上设置值

Now, I want / need the row in the original dataframe to be replaced with the row of the new dataframe, but I need the DocID to be kept the same.现在,我希望/需要将原始数据帧中的行替换为新数据帧的行,但我需要保持 DocID 不变。 I also need the result to be stored in the original dataframe.我还需要将结果存储在原始数据框中。

One way could be to create a list of columns to replace and then use to_numpy to avoid any alignment issue like:一种方法是创建要替换的列列表,然后使用to_numpy来避免任何对齐问题,例如:

cols_replace = ['DocURL','DocName','SiteURL','LibraryURL']
df.loc[row_index, cols_replace] = new_df.loc[0, cols_replace].to_numpy()

Just use df.update() to get what you need.只需使用df.update()即可获得所需内容。

Code:代码:

df=pd.DataFrame({'DocID':[29806,29807],'DocURL':['path/to/doc/docname1.doc','path/to/doc/docname2.doc'],
                'DocName':['docname1','docname2'],'SiteURL':['web/url','web/url'],
                'LibraryURL':['lib/url','lib/url']})

df2=pd.DataFrame({'DocURL':['path/to/doc/newname.doc'],
                'DocName':['newname'],'SiteURL':['web/url'],
                'LibraryURL':['lib/url']})

df.update(df2)

Output:输出:

    DocID   DocURL                   DocName       SiteURL  LibraryURL
0   29806   path/to/doc/newname.doc  newname       web/url  lib/url
1   29807   path/to/doc/docname2.doc docname2      web/url  lib/url

df.update() in this case will update your original values in df with the new values in df2 .在这种情况下, df.update()将使用df2的新值更新df的原始值。 The update will be done based on index.更新将基于索引完成。 So make sure the index numbers in df2 matches those in df .因此,请确保df2中的索引号与df的索引号匹配。

尝试这个:

df.loc[df['DocID'] == '29806', ['DocURL', 'DocName', 'SiteURL', 'LibraryURL']] = dfNew.iloc[0]['DocURL', 'DocName', 'SiteURL', 'LibraryURL']
new_df["DocID"] = [29806]

old_df.set_index("DocID")
new_df.set_index("DocID")

old_df.update(new_df)

Your best bet is to add a DocID column to your new dataframe and populate it with the DocIDs from the old dataframe you'd like to update.最好的办法是将DocID列添加到新数据框中,并使用您要更新的旧数据框中的 DocID 填充它。 Then, set DocID to be the index.然后,将DocID设置为索引。 Finally, calling .update defaults to aligning on indices, and the behavior is fully controlled.最后,调用.update默认对齐索引,并且行为是完全受控的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM