I have the following csv files:
file1.csv #dataframe is named dfFile1
Id,name,pos_neg,line
1,abc,pos,cas
2,cde,neg,work
3,efg,pos,cat
4,abc,pos,job
file2.csv #dataframe is named dfFile2
Id,ref,names,other
c10,n1,www,10.5
c11,m4,efg,5.4
c12,m5,cde,9.8
c13,m9,hhh,6.7
c14,n4,abc,12.5
c15,n9,kkk,3.4
which I converted into dataframes using pandas. I would like to obtain a third data frame that matches the rows of dfFile2 according to the unique values presented in the name field of dfFile1, and also add the pos_neg row from file 1, so I will end up with:
dfNew
Id,ref,names,other,pos_neg
c11,m4,efg,5.4,pos
c12,m5,cde,9.8,neg
c14,n4,abc,12.5,pos
So far, I have done the following:
list=[]
list=dfFile1["name"].unique() #contains [abc,cde,efg]
dfFile2=dfFile2[dfFile2.names.isin(list)]
but I just do not know how can I merge the column pos_neg from dfFile1, I tried the following:
dfNew=dfFile2.merge(dfFile2,dfFil1[["pos_neg"]],on=dfFile2)
,but it does not work.
Any help?
Thanks
You were almost there, just some tweaking with the DataFrame.merge
method, furthermore you need drop_duplicates
here, since abc
appears twice in the dfFile1
.
dfNew = (
dfFile2.merge(dfFile1[['name', 'pos_neg']],
left_on='names',
right_on='name')
.drop_duplicates()
.drop(columns='name')
)
Id ref names other pos_neg
0 c11 m4 efg 5.4 pos
1 c12 m5 cde 9.8 neg
2 c14 n4 abc 12.5 pos
Sidenote: in Python we don't use camelCase
for variable names but lowercase with underscore camel_case
. See PEP8 style guide
:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
You can iterate through your dataframe with iterrows
df3 = df2[df2.names.isin(names)]
for index, row in df3.iterrows():
row = df[row['names'] == df['name']]['pos_neg']
df3.loc[index,'pos_neg'] = row.iloc[0]
row.loc[0] stands for rows that has same 'name' field. Gets first of same named rows
Try:
dfNew = dfFile2.merge(dfFile1[["name", "pos_neg"]], how="inner", left_on="names", right_on="name")
Rearranging the columns and/or renaming them shouldn't be difficult if above works.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.