简体   繁体   中英

Comparing two panda dataframes with different size

I want to compare two dataframes with content of 1s and 0s. I run for loops to check every element of the dataframes and at the end, I want to replace the "1" values in dataframe out that are equal with the dataframe df with the letter d and the values that are not equal between the dataframes with the letter i in the dataframe out . This code is too slow and I need some input to make it efficient and faster; does anyone have any idea? Also the df dataframe is 420x420 and the out 410x410

a1=out.columns.values
a2=df.columns.values
b1=out.index.values
b2=df.index.values

for a in a1:
 for b in b1:
    for c in a2:
        for d in b2:
            if a == c and b == d:
                if out.loc[b,a] == 1 and df.loc[d,c]==1:
                    out.loc[b,a] = "d"
                elif out.loc[b,a] != df.loc[d,c]:
                    out.loc[d,c] = "i"
            else:
                pass

A small example for better understanding: Dataframe df

1 2 3 4
1 0 1 1
2 1 0 0
3 1 0 0
4 0 0 0

Dataframe out

1 2 3 4
1 0 1 1
2 1 0 1
3 1 1 0
4 0 0 0

And the resulted dataframe out should be like that:

1 2 3 4
1 0 d d
2 d 0 i
3 d i 0
4 0 0 0

I created your dataframes like theese:

# df creation
data1 = [
    [1, 0, 1, 1],
    [2, 1, 0, 0],
    [3, 1, 0, 0],
    [4, 0, 0, 0]
]

df = pd.DataFrame(data1, columns=[1, 2, 3, 4])
1 2 3 4
1 0 1 1
2 1 0 0
3 1 0 0
4 0 0 0
# df_out creation
data2 = [
    [1, 0, 1, 1],
    [2, 1, 0, 1],
    [3, 1, 1, 0],
    [4, 0, 0, 0]
]

df_out = pd.DataFrame(data2, columns=[1, 2, 3, 4])
1 2 3 4
1 0 1 1
2 1 0 1
3 1 1 0
4 0 0 0

# Then I used 'np.where' method on all intersected columns.
intersected_columns = set(df.columns).intersection(df_out.columns)

for col in intersected_columns:
   if col != 1:  # I think first column is the index
       df_out[col] = np.where(# First condition
           (df[col] == 1) & (df_out[col] == 1),  
            "d",  # If first condition is true
             np.where(  # If first condition is false apply second condition
                    df[col] != df_out[col],
                    "i",
                    df_out[col])
            )

Output like this:
|   1 | 2   | 3   | 4   |
|----:|:----|:----|:----|
|   1 | 0   | d   | d   |
|   2 | d   | 0   | i   |
|   3 | d   | i   | 0   |
|   4 | 0   | 0   | 0   |

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM