简体   繁体   English

查找两个Excel工作表的列之间的差异

[英]Finding the difference between the columns of two excel sheets

I have two excel files that both have multiples sheets. 我有两个都有多个工作表的Excel文件。 The two files have some sheets in common ie they have the same sheet name but different data and values. 这两个文件有一些共同的工作表,即它们具有相同的工作表名称,但数据和值不同。 However, these sheets with the same name have more columns in one file than the other. 但是,这些同名工作表在一个文件中比其他文件具有更多的列。 What I want to do is copy the extra columns from the sheet that has extra columns to the sheet (in other excel file) that has them missing. 我想要做的是将具有额外列的工作表中的额外列复制到缺少它们的工作表(在其他excel文件中)中。 Again the data in the common columns is different so I cant just simply copy the bigger sheet into the smaller one. 同样,公共列中的数据是不同的,因此我不能简单地将较大的工作表复制到较小的工作表中。

First reading the two files: 首先阅读两个文件:

 v8 = pd.read_excel('Revised_V8.xlsx', sheet_name=None)
 v9 = pd.read_excel('Revised_V9.xlsx', sheet_name=None)

Now reading one common sheet in both files 现在读取两个文件中的一张普通纸

  MAP_8 = v8['MAP']
  MAP_9 = v9['MAP']

Now both MAP_8 and MAP_9 are oredreddict. 现在MAP_8和MAP_9都是oredreddict。 I use this line to get the names of the extra columns in V9 我使用此行来获取V9中多余列的名称

  d=set(MAP_9)-set(MAP_8)

I'm stuck here. 我被困在这里。 My idea is to retrieve the data in those columns in d and then add that to v8 dataframe 我的想法是检索d中这些列中的数据,然后将其添加到v8数据框中

  xtracol = MAP_9[d]    # I want to return the values of those columns saved in d

I get an error here TypeError: unhashable type: 'set' 我在这里收到错误TypeError:无法散列的类型:'设置'

Sorry but I have no idea how to fix this or get the extar columns without using set. 抱歉,但我不知道如何在不使用set的情况下解决此问题或获取extar列。

to summarize, lets say MAP_9 has three columns A,B, C where MAP_8 has only two columns A, B. The data in A and B is different between the two sheets. 概括来说,假设MAP_9具有三列A,B,C,其中MAP_8仅具有两列A,B。A和B中的数据在两张纸之间是不同的。 I only want to copy columns C from MAP_9 and add it to MAP_8 without changing the values of A and B in MAP_8. 我只想从MAP_9复制列C并将其添加到MAP_8,而无需更改MAP_8中的A和B的值。

This is just a simple case but I have more than dozen of common sheets, and some have tens extra columns than the other 这只是一个简单的例子,但是我有十几个常用的工作表,有些工作表比其他工作表多了十列

Thank you in advance 先感谢您

I do not know the syntax of operating Excel with Python, but I do know a fair bit about Excel and Python. 我不知道使用Python操作Excel的语法,但是我对Excel和Python相当了解。 Now you have the names of the columns that are missing in the other sheet, for every extra column add an empty column to the sheet that is missing it, under the same name. 现在,您有了另一个工作表中缺少的列的名称,对于每一个额外的列,请使用相同的名称向缺少它的工作表中添加一个空列。 Then load the data from the extra column into Python and write it into the new empty column. 然后将额外列中的数据加载到Python中,并将其写入新的空列中。 To repeat the process automatically, do some simple Python looping such as: For sheet in sheets:
MAP_8 = v8[sheet]
MAP_9 = v9[sheet]
要自动重复该过程,请执行一些简单的Python循环,例如: For sheet in sheets:
MAP_8 = v8[sheet]
MAP_9 = v9[sheet]
For sheet in sheets:
MAP_8 = v8[sheet]
MAP_9 = v9[sheet]
For sheet in sheets:
MAP_8 = v8[sheet]
MAP_9 = v9[sheet]
Etc. I can expand on this in comments if needs be.
For sheet in sheets:
MAP_8 = v8[sheet]
MAP_9 = v9[sheet]
等。如果需要,我可以在注释中对此进行扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM