I am looking for an algorithm to comapre two excel sheets, based on their column names, in Python.
I do not know what the columns are, so one sheet may have an additional column or both sheets can have several columns with the same name.
The easiest case is when a column in the first sheet corresponds to only one column in the second excel sheet. Then I can perform the diff on rows of that column using xlrd
. If the column name is not unique, I can verify if the columns have the same position.
Does anyone know of an already existing algorithm or have any experience in this domain?
Fast an dirty:
# Since order of the names doesn't matter, we can use the set() option
matching_names = set(sheet_one_names) & set(sheet_one_names)
...
# Here, order does matter since we're comparing rowdata..
# not just if they match at some point.
matching_rowdata = [i for i, j in zip(columndata_one, columndata_two) if i != j]
Note: This assumes that you've done a few things ahead,
xlrd
and same for the second sheet, This is to give you an idea.
Also note that doing the [...] option (second one) it's important that the rows are of the same length, otherwise it will be skipped. This is a MISS-MATCH scenario, reverse to get the matches in the data flow.
This is a slower but functional solution:
column_a_name = ['Location', 'Building', 'Location']
column_a_data = [['Floor 1', 'Main', 'Sweden'],
['Floor 2', 'Main', 'Sweden'],
['Floor 3', 'Main', 'Sweden']]
column_b_name = ['Location', 'Building']
column_b_data = [['Sweden', 'Main', 'Floor 1'],
['Norway', 'Main', 'Floor 2'],
['Sweden', 'Main', 'Floor 3']]
matching_names = []
for pos in range(0, len(column_a_name)):
try:
if column_a_name[pos] == column_b_name[pos]:
matching_names.append((column_a_name[pos], pos))
except:
pass # Index out of range, column length are not the same
mismatching_data = []
for row in range(0, len(column_a_data)):
rowa = column_a_data[row]
rowb = column_b_data[row]
for name, _id in matching_names:
if rowa[_id] != rowb[_id] and (rowa[_id] not in rowb or rowb[_id] not in rowa):
mismatching_data.append((row, rowa[_id], rowb[_id]))
print mismatching_data
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.