I have two PANDAS data-frames and I need to merge them on call_id. I have done this with different data frames. However, this time when I try
df = pd.merge(labels, sequences, on = "call_id")
I get
The column label 'call_id' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
In [231]: labels
Out[231]:
call_id confidences
1 6081bdea52c838000aaa53d3 {'1': 0.27, '2': 0.68, '0': 0.5}
2 6081c27bde933a000a4384b0 {'1': 0.73, '2': 0.27}
3 6081c54dd12abf000ab3c6f5 {'0': 0.66, '1': 0.67}
4 6081c666d7a1f7001cecce98 {'0': 0.22, '1': 0.82}
5 6081d8576eb5530043e3401f {'2': 0.33, '1': 0.66, '0': 0.23}
.. ... ...
480 transcript96 {'0': 0.38, '1': 0.73}
481 transcript97 {'0': 0.78, '2': 0.31}
482 transcript98 {'1': 0.65, '0': 0.46}
483 transcript99 {'2': 0.29, '1': 0.79}
484 trsc1 {'0': 0.42, '2': 0.27, '1': 0.44}
[484 rows x 2 columns]
In [232]: sequences
Out[232]:
call_id sentiments
1 6081c27bde933a000a4384b0 PENNNNNEENNPNPEPNPPNNNNNNNNNNN
2 6081c54dd12abf000ab3c6f5 NNPNNNPNNNPPNNN
3 6081c666d7a1f7001cecce98 NNNNNPP
4 6081d8576eb5530043e3401f NNNNPNNNNNNNNNNNNNNNNNNPPNNNNNNNNNENNNNNNENNNN...
5 6081d8fb0ef716000a2ef933 NNNNENNNPNEEENNNNNNNNNNNNNNNNNNPNE
.. ... ...
465 transcript96 NPN
466 transcript97 NNNNNEENNNNENPNNNNENNNNNPNNPNNNNNNNNPENNNPPPP
467 transcript98 NNNNNNNNENNNPPNNNENNENNENNNENENNNP
468 transcript99 PENNN
469 trsc1 NPNPEENEPPN
[469 rows x 2 columns]
You have to call the merge function different:
labels.merge(sequences, how='inner', on='call_id')
Please look in the how=
method here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html to be sure you understand the different options (keep all rows, only rows in the right or left DataFrame etc.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.