简体   繁体   中英

Comparing 2 pandas dataframes and outputting inequivalent values Python

I want to write a piece of code in pandas where it gets 2 data frames Unix,Unix2 compares them and outputs the range of indexes where there are differences. For example index 1 has 1444311780 for Unix and 1444311790 for Unix2 the values of Unix and Unix2 are different so it would make index 1 to be the starting range. Ending range would be the last consecutive value of an inequality so which would be for index 2 which compares 1635686040 and 1635686034 with Unix, Unix2 respectably.

import time
import datetime
import pandas as pd 

Unix= pd.DataFrame([1444311600, 1444311780, 1635686040, 1635686200, 1635686220])
Unix2 = pd.DataFrame([1444311600, 1444311790, 1635686034, 1635686200, 1635686230])

Expected Output:

first       last        
1           2  
4              

If I understand you correctly, you want to find the start and end index of every unequal streak. Try this:

# Compare Unix to Unix2, row-by-row
s = Unix[0] != Unix2[0]

# Assign the group number. Every time `s` flips from True to False
# or vice-versa, make a new group
t = s.ne(s.shift()).cumsum()

# Filter for the groups whose members are all True
u = t[s]

# For those groups, find the min and the max index of their members
result = u.index.to_series().groupby(u).agg(['min', 'max'])

Output:

   min  max
0          
2    1    2
4    4    4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM