简体   繁体   English

如何比较python中两个excel之间的列?

[英]how to compare column between two excel in python?

I have two excel我有两个excel

Excel 1 : Excel 1 :

A,B,C甲、乙、丙

1,2,3 1,2,3

Excel 2 : Excel 2 :

A,C,B甲、丙、乙

1,3,2 1,3,2

How can i re position the excel 2 base on excel 1 column ?如何根据 excel 1 列重新定位 excel 2?

so that A,C,B and become A,B,C使 A,C,B 变成 A,B,C

I use the following code to check column orders:我使用以下代码检查列顺序:

comparison_Columns = pd.read_excel(xls).columns == pd.read_excel(xls2).columns
if all(comparison_Columns):
    pass
else:
    print('Wrong column order !!!!! ')
df1 = pd.read_excel(xls)
df2 = pd.read_excel(xls2)

if all(df1.columns == df2.columns):
    pass
else:
    df1 = df1[df2.columns]

It doesn't really matter if the data comes from excel or other format.数据是来自excel还是其他格式并不重要。 If you know that both have the same columns up to order you could just如果你知道两者都有相同的列,你可以

import pandas as pd
df0 = pd.DataFrame([[1,2,3]], columns=["A","B","C"])
df1 = pd.DataFrame([[1,3,2]], columns=["A","C","B"])

print(df1[df0.columns])

   A  B  C
0  1  2  3

This code snippet will work fine:此代码片段将正常工作:

def areColumnSame(df1, df2, checkTypes = True):
    if checkTypes:
        type1 = dict(df1.dtypes)
        type2 = dict(df2.dtypes)
        return type1 == type2

    else:
        col1 = list(df1.columns)
        col2 = list(df2.columns)
        col1.sort()
        col2.sort()
        return col1 == col2

To show how the above code works let us explore examples:为了展示上面的代码是如何工作的,让我们研究一下例子:

Consider three excel files:考虑三个excel文件:

| A | B | C |
|---|---|---|
| 1 | 2 | 3 |
| 4 | 5 | 6 |

| A | C | B |
|---|---|---|
| 1 | 3 | 2 |
| 4 | 6 | 5 |

| A | B | C | A.1 | B.1 | C.1 |
|---|---|---|-----|-----|-----|
| 1 | 2 | 3 | 1   | 2   | 3   |
| 4 | 5 | 6 | 4   | 5   | 6   |

Now for the first file the dict(df.dtypes) is shown below:现在对于第一个文件dict(df.dtypes)如下所示:

{'A': dtype('int64'),
 'B': dtype('int64'),
 'C': dtype('int64')}

Similarly for other two files:其他两个文件类似:

{'A': dtype('int64'),
 'C': dtype('int64'),
 'B': dtype('int64')}

and

{'A': dtype('int64'),
 'B': dtype('int64'),
 'C': dtype('int64'),
 'A.1': dtype('int64'),
 'B.1': dtype('int64'),
 'C.1': dtype('int64')}

We just need to compare these dictionaries to get the result.我们只需要比较这些字典就可以得到结果。 At the same time, it also checks for the type of data.同时,它还检查数据的类型。

Hence for the comparison between the first two files will be true whereas the comparison with third will return false.因此,前两个文件之间的比较将为真,而与第三个文件的比较将返回假。

But you can always disable the type-checking in which case we will just check whether [A, B, C] is the same as [A, C, B] without comparing their types.但是你总是可以禁用类型检查,在这种情况下我们只会检查[A, B, C]是否与[A, C, B]而不比较它们的类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM