简体   繁体   中英

Display common elements and differences of 2 DataFrames with different size

I have 2 DataFrames containing strings values. They have different sizes as well. I would like to display the common elements and the differences between the 2 DataFrames.

My approach is: I created a function compare(DataFrame1, DataFrame2) which will compare using equals method the 2 DataFrames. If they are the same then I don't need to find any more the differences. I need a second function which will actually show the differences between the DataFrames. Can someone help me continue?

def test2_expansion():
    test1 = graph.run('match (n:Disease)-[:HAS_CHILD]->(m:Disease) where n.id ="C0039446" return distinct m.id order by m.id;')
    test1 = pd.DataFrame(test1.data())
    return test1

g = test2_expansion()
g = g.to_dict(orient='list')
print ("The result of test 2 for expansion in Neo4j is ")
for key, value in g.items():
    for n in value:
        print(n)


def compareResults(a,b):
    if a.equals(b):
        return True
    else:
        return False

def takeDifferences():
     a = "Search differences"
     if (compareResult() == True):
        return "Amaizing!"
     else:
        return a


DataFrame1       
C0494228             
C0272078
C2242772

DataFrame2
C2242772
C1882062
C1579212
C1541065
C1306459
C0442867
C0349036
C0343748
C0027651
C0272078

Display Common Elements: C0272078 C2242772
Elements found only in DataFrame1:C0494228
Elements found only in DataFrame2:C2242772
C1882062
C1579212
C1541065
C1306459
C0442867
C0349036
C0343748
C0027651

I can show you now my generic function which will answer my question

def compare(a,b):
    if a.equals(b):
        print("SAME!")
    else:
        df = a.merge(b, how='outer',indicator=True)
        x = df.loc[df['_merge'] == 'both', 'm.id']
        y = df.loc[df['_merge'] == 'left_only', 'm.id']
        z = df.loc[df['_merge'] == 'right_only', 'm.id']
        print (f'Display Common Element: {", ".join(x)}')
        print (f'Elements found only in DataFrame1: {", ".join(y)}')
        print (f'Elements found only in DataFrame2: {", ".join(z)}')

In this moment my function returns None because I don't know if I should return something, but it works perfectly. Thank you @jezrael

If there are DataFrames with columns same - eg m.id use DataFrame.merge with indicator parameter:

df = df1.merge(df2, how='outer', indicator=True)
print (df)
        m.id      _merge
0   C0494228   left_only
1   C0272078        both
2   C2242772        both
3   C1882062  right_only
4   C1579212  right_only
5   C1541065  right_only
6   C1306459  right_only
7   C0442867  right_only
8   C0349036  right_only
9   C0343748  right_only
10  C0027651  right_only

And then filter by boolean indexing :

a = df.loc[df['_merge'] == 'both', 'm.id']
b = df.loc[df['_merge'] == 'left_only', 'm.id']
c = df.loc[df['_merge'] == 'right_only', 'm.id']

Last join values with f-string s:

print (f'Display Common Element: {", ".join(a)}')
Display Common Element: C0272078, C2242772

print (f'Elements found only in DataFrame1: {", ".join(b)}')
Elements found only in DataFrame1: C0494228

print (f'Elements found only in DataFrame2: {", ".join(c)}')
Elements found only in DataFrame2: C1882062, C1579212, C1541065, 
                                   C1306459, C0442867, C0349036, 
                                   C0343748, C0027651

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM