简体   繁体   中英

Compare some text with difflib and openpyxl Python

I want to find some words in some cells of a certain column. Those words (use to compare) are stocked in another sheet. I'm trying to use a loop to choose all the cells from this column.

I got this code :

    import difflib 
    import openpyxl 
    from openpyxl import load_workbook

    table = "C:\Users\Myname\Documents\Python Scripts\TRY.xlsx" 
    table = load_workbook(table)
    table.get_sheet_names()
    #  [u'Compared', u'To']

    work_sheet = table['Compared'] 
    compare_sheet = table['To']

    row_max = sum(1 for row in work_sheet)
    # count the number of rows 
    print ( row_max) # 8 

    liste = range(1,row_max+1)
    print liste


    for i in liste: 
        a = 'A'
        b = 'B'
        index = a + `i`
        comp = b + `i`
        column1 = ''.join(["'", index,"'"]) # Ref to the Cell which will be compared
        column2 = ''.join(["'",comp,"'"]) # Ref to the word I want to find 
        print (column1)
        print (column2)
        diff = difflib.SequenceMatcher(None,work_sheet[{}].format(column1),compare_sheet[{}].value.format(column2)).ratio()
        print diff


    #     ERROR
    #      File "C:\Users\Myname\AppData\Local\Continuum\Anaconda2\lib\site-packages\openpyxl-2.3.4-py2.7.egg\
#    openpyxl\utils\__init__.py", line 39, in coordinate_from_string

 #        match = COORD_RE.match(coord_string.upper())

I've looked at this line 39 in the file "__ init__.py" and I have :

def coordinate_from_string(coord_string):
    """Convert a coordinate string like 'B12' to a tuple ('B', 12)"""
    match = COORD_RE.match(coord_string.upper())
    if not match:
        msg = 'Invalid cell coordinates (%s)' % coord_string
        raise CellCoordinatesException(msg)
    column, row = match.groups()
    row = int(row)
    if not row:
        msg = "There is no row 0 (%s)" % coord_string
        raise CellCoordinatesException(msg)
    return (column, row)

but if I do it manually, it works :

diff = difflib.SequenceMatcher(None, work_sheet['A2'].value, compare_sheet['B2'].value).ratio()
print diff
# 0.133333333333

You can see the table here : https://docs.google.com/spreadsheets/d/1Mckc6YXeWQQ0CrnLKFqH5jeUMUn9CjvW_pTYs1rvw1A/edit?usp=sharing

Can someone explains me where does this error come from ?

(The whole traceback :

    Traceback (most recent call last):

  File "<ipython-input-14-ed4b6265ee5a>", line 43, in <module>
    diff = difflib.SequenceMatcher(None,work_sheet[{}].format(column1),compare_sheet[{}].value.format(column2)).ratio()

  File "C:\Users\Myname\AppData\Local\Continuum\Anaconda2\lib\site-packages\openpyxl-2.3.4-py2.7.egg\openpyxl\worksheet\worksheet.py", line 338, in __getitem__
    row, column = coordinate_to_tuple(key)

  File "C:\Users\Myname\AppData\Local\Continuum\Anaconda2\lib\site-packages\openpyxl-2.3.4-py2.7.egg\openpyxl\utils\__init__.py", line 162, in coordinate_to_tuple
    col, row = coordinate_from_string(coordinate)

  File "C:\Users\Myname\AppData\Local\Continuum\Anaconda2\lib\site-packages\openpyxl-2.3.4-py2.7.egg\openpyxl\utils\__init__.py", line 39, in coordinate_from_string
    match = COORD_RE.match(coord_string.upper())

)

As was pointed out, you don't show the code that causes the traceback, and you don't show the traceback -- you should copy and paste. Your code has lots of problems. I edit it below

import difflib
# line below never used 
# import openpyxl 
from openpyxl import load_workbook

table = "C:\Users\Myname\Documents\Python Scripts\TRY.xlsx" 
table = load_workbook(table)
table.get_sheet_names()
#  [u'Compared', u'To']

work_sheet = table['Compared'] 
compare_sheet = table['To']

# is there not an easier way to find out how many rows?
row_max = sum(1 for row in work_sheet)
# count the number of rows 
print ( row_max) # 8 

# remove these lines
# liste = range(1,row_max+1)
# print liste


# for i in liste:
for i in range(1, row_max+1): 
    # a = 'A'
    # b = 'B'
    # what do you think index and comp are after these two lines?
    # index = a + `i`
    # comp = b + `i`
    index = 'A' + str(i)
    comp = 'B' + str(i)
    column1 = ''.join(["'", index,"'"]) # Ref to the Cell which will be compared
    column2 = ''.join(["'",comp,"'"]) # Ref to the word I want to find 
    print (column1)
    print (column2)
    diff = difflib.SequenceMatcher(None,work_sheet[{}].format(column1),compare_sheet[{}].value.format(column2)).ratio()
    print diff


#     ERROR
#  match = COORD_RE.match(coord_string.upper())
#
#AttributeError: 'dict' object has no attribute 'upper'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM