I have two dataframes of different sizes:
ConceptID1 ConceptID2
0 5743 4513
1 5743 7099
2 4513 7099
3 10242 7042
4 10242 7099
... ... ...
2601 12028 12043
2602 12371 12043
2603 266632 54106
2604 266632 51135
2605 54106 51135
Gene1 Gene2
0 1535 353
1 9970 332
2 23581 112401
3 846 112401
4 150160 112401
.. ... ...
384 79626 51284
385 79626 51311
386 7305 51311
387 80342 79626
388 7305 79626
Comparing through both data frames, I need to find matching pairs.
I tried this
for index, row in sdfn.iterrows():
for index, row in jdfn.iterrows():
if ((sdfn['ConceptID1']==jdfn['Gene1']) and (sdfn['ConceptID2']==jdfn['Gene2'])) or (sdfn['ConceptID1']==jdfn['Gene2']) and ((sdfn['ConceptID2']==jdfn['Gene1'])):
print(sdfn['ConceptID1'], jdfn['Gene1'], sdfn['ConceptID2'], jdfn['Gene2'])
The result:
Traceback (most recent call last):
File "", line 3, in
if ((sdfn['ConceptID1']==jdfn['Gene1']) and (sdfn['ConceptID2']==jdfn['Gene2'])) or
(sdfn['ConceptID1']==jdfn['Gene2']) and ((sdfn['ConceptID2']==jdfn['Gene1'])): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/ops/ init .py", line 1142, in wrapper raise ValueError("Can only compare identically-labeled " "Series objects")
ValueError: Can only compare identically-labeled Series objects
The issue here is that you are not using or naming your for
loop variables correctly and attempting to compare the entirety of each dataframe column directly.
sdfn['ConceptID1']
, sdfn['ConceptID2']
, jdfn['Gene1']
, jdfn['Gene2']
will refer to the entire dataframe column, which pandas defines as a Series
type object, hence the mention of Series
label mismatch in the error message.
You will need to first rename your for
loop variables, and then use them in the search:
for sind, srow in sdfn.iterrows():
for jind, jrow in jdfn.iterrows():
if ((srow['ConceptID1']==jrow['Gene1']) and (srow['ConceptID2']==jrow['Gene2'])) or (srow['ConceptID1']==jrow['Gene2']) and ((srow['ConceptID2']==jrow['Gene1'])):
print(srow['ConceptID1'], jrow['Gene1'], srow['ConceptID2'], jrow['Gene2'])
Note that in your posted code, index
and row
variables are declared and assigned in the outer loop yet are modified in the inner loop. So instead of having two pairs of loop variables, there is only one pair that is being incremented and overwritten, thus unable to compare the appropriate data.
Hope this helps!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.