Some Hyperlinks not opening with Openpyxl

Question

I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.

I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.

However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.

My sample file has three rows and looks like this:

My actual data has hyperlinks in the way that I have it for "Google" in my test data set.

The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.

Sample Test file here: Text.xlsx

Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?

import openpyxl 

wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active

column_indices = [1]
max_col = ws.max_column

ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1): 
    for col in column_indices:
        print(ws.cell(row, column=1).hyperlink.target)
        ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target


wb.save('test.xlsx')

Answer 1

The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function. The values looks like below,

>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'

Suggested code to handle HYPERLINK:

val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
    hyplink = ws.cell(4,1).value  # Or use re module for more robust check

Note: The second for loop to iterate over columns seems not required since you are always using column=1.

Some Hyperlinks not opening with Openpyxl

Question

1 answers

solution1
1 ACCPTED 2020-07-28 17:15:28

Some Hyperlinks not opening with Openpyxl

Question

1 answers

solution1 1 ACCPTED 2020-07-28 17:15:28

solution1
1 ACCPTED 2020-07-28 17:15:28