I am trying to understand why merge function is duplicating values.
>>> c2.head()
Out[42]:
Bin Date/Time val
A 10/31/2017 15:53:57 0.77
A 10/31/2017 15:53:57 0.75
A 10/31/2017 15:53:57 0.79
A 10/31/2017 15:53:57 0.67
A 10/31/2017 15:53:57 0.72
>>> c1.head()
Out[44]:
Bin Date/Time code
A 10/31/2017 15:53:57 BYM
A 10/31/2017 15:53:57 CFS
A 10/31/2017 15:53:57 DFZ
A 10/31/2017 15:53:57 HKN
A 10/31/2017 15:53:57 RBF
I need to merge these 2 on Bin and Datetime.
>>> c= c1.merge(c2, on =['Bin','Date/Time'], how= 'left')
>>> c.head()
Out[50]:
Bin Date/Time Code Val
A 10/31/2017 15:53:57 BYM 0.77
A 10/31/2017 15:53:57 BYM 0.77
A 10/31/2017 15:53:57 BYM 0.77
A 10/31/2017 15:53:57 BYM 0.77
A 10/31/2017 15:53:57 BYM 0.77
So c has multiple entries for the same bin/datetime. I thought that maybe the datetime values look the same but are different. But that's not the case.
>>> c1['Date/Time'].iloc[0]
Out[46]: u'10/31/2017 15:53:57'
>>> c2['Date/Time'].iloc[0]
Out[47]: u'10/31/2017 15:53:57'
>>> c1['Date/Time'].iloc[0]==c2['Date/Time'].iloc[0]
Out[48]: True
In addition, even if datetime was different, there should be only 2 lines for each bin/datetime. Any idea what might be happening here?
My intended output is:
Bin Date/Time Code Val
A 10/31/2017 15:53:57 BYM 0.77
A 10/31/2017 15:53:57 CFS 0.75
A 10/31/2017 15:53:57 DFZ 0.79
A 10/31/2017 15:53:57 HKN 0.67
A 10/31/2017 15:53:57 RBF 0.72
Duplicating values happen because of unique val
s in c2
.
Simplified example:
>>> c1.head(1)
Bin Date/Time code
0 A 2017-10-31 15:53:57 BYM
Merge this 1 row with c2
:
>>> c1.head(1).merge(c2, on=['Bin','Date/Time'], how='left')
Bin Date/Time code val
0 A 2017-10-31 15:53:57 BYM 0.77
1 A 2017-10-31 15:53:57 BYM 0.75
2 A 2017-10-31 15:53:57 BYM 0.79
3 A 2017-10-31 15:53:57 BYM 0.67
4 A 2017-10-31 15:53:57 BYM 0.72
You are merging on two keys ['Bin','Date/Time']
and for each code
in c1, it's bringing over each unique val
from c2.
It doesn't appear you need a merge. If the 2 dataframes have the same size and index, then you can simply assign one series to another:
c1.val = c2.val
Sometimes, you may wish to copy across multiple series from one dataframe to another. Instead of looping over multiple columns, this can be achieved via combine_first
:
c1.combine_first(c2)
This gives priority to c1
in case of common indices, but it will not matter if the only difference is one dataframe has an extra column.
If indices are different, you may wish to realign them via .reset_index()
before either of the above methods.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.