简体   繁体   中英

how to merge a column from a dataframe to another based on a condition?

I have the following csv files:

file1.csv #dataframe is named dfFile1
Id,name,pos_neg,line
1,abc,pos,cas
2,cde,neg,work
3,efg,pos,cat
4,abc,pos,job

file2.csv #dataframe is named dfFile2
Id,ref,names,other
c10,n1,www,10.5
c11,m4,efg,5.4
c12,m5,cde,9.8
c13,m9,hhh,6.7
c14,n4,abc,12.5
c15,n9,kkk,3.4

which I converted into dataframes using pandas. I would like to obtain a third data frame that matches the rows of dfFile2 according to the unique values presented in the name field of dfFile1, and also add the pos_neg row from file 1, so I will end up with:

dfNew
Id,ref,names,other,pos_neg
c11,m4,efg,5.4,pos
c12,m5,cde,9.8,neg
c14,n4,abc,12.5,pos

So far, I have done the following:

list=[]
list=dfFile1["name"].unique()    #contains [abc,cde,efg]
dfFile2=dfFile2[dfFile2.names.isin(list)]

but I just do not know how can I merge the column pos_neg from dfFile1, I tried the following:

dfNew=dfFile2.merge(dfFile2,dfFil1[["pos_neg"]],on=dfFile2)

,but it does not work.

Any help?

Thanks

You were almost there, just some tweaking with the DataFrame.merge method, furthermore you need drop_duplicates here, since abc appears twice in the dfFile1 .

dfNew = (
    dfFile2.merge(dfFile1[['name', 'pos_neg']], 
                  left_on='names', 
                  right_on='name')
    .drop_duplicates()
    .drop(columns='name')
)

    Id ref names  other pos_neg
0  c11  m4   efg    5.4     pos
1  c12  m5   cde    9.8     neg
2  c14  n4   abc   12.5     pos

Sidenote: in Python we don't use camelCase for variable names but lowercase with underscore camel_case . See PEP8 style guide :

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

You can iterate through your dataframe with iterrows

df3 = df2[df2.names.isin(names)]

for index, row in df3.iterrows():
    row = df[row['names'] == df['name']]['pos_neg']
    df3.loc[index,'pos_neg'] = row.iloc[0]

row.loc[0] stands for rows that has same 'name' field. Gets first of same named rows

Try:

dfNew = dfFile2.merge(dfFile1[["name", "pos_neg"]], how="inner", left_on="names", right_on="name")

Rearranging the columns and/or renaming them shouldn't be difficult if above works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM