简体   繁体   中英

Some Hyperlinks not opening with Openpyxl

I have a few hundred files with data and hyperlinks in them that I was trying to upload and append to a single DataFrame when I realized that Pandas was not reading any of the hyperlinks.

I then tried to use Openpyxl to read the hyperlinks in the input Excel files and write a new column into the excels with the text of the hyperlink that hopefully Pandas can read into my dataframe.

However, I am running into issues with my testing the openpyxl code. It is able to read and write some of the hyperlinks but not the others.

My sample file has three rows and looks like this:

输入数据

My actual data has hyperlinks in the way that I have it for "Google" in my test data set.

超链接样式 1

The other two hyperlinks in my text data, I inserted by right clicking on the cell and pasting the link.

第二种类型的超链接输入

Sample Test file here: Text.xlsx

Here is the code I wrote to read the hyperlink and paste it in a new column. It works for the first two rows (India and China) but fails for the third row (Google). It's unfortunate because all of my actual data is of that type. Can someone please help me figure it out?

import openpyxl 

wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active

column_indices = [1]
max_col = ws.max_column

ws.cell(row=1,column = max_col+1).value = "Hyperlink Text"
for row in range(2,ws.max_row+1): 
    for col in column_indices:
        print(ws.cell(row, column=1).hyperlink.target)
        ws.cell(column=max_col+1,row=row).value = ws.cell(row, column=1).hyperlink.target


wb.save('test.xlsx')

在此处输入图像描述

The cells where you are using the HYPERLINK function (like google.com) will not be of type hyperlink. You will need to process the cells with HyperLink function using re so similar function. The values looks like below,

>>> ws.cell(2,1).value
'China'
>>> ws.cell(3,1).value
'India'
>>> ws.cell(4,1).value
'=HYPERLINK("www.google.com","google")'

Suggested code to handle HYPERLINK:

val = ws.cell(row,column).value
if val.find("=HYPERLINK") >= 0 :
    hyplink = ws.cell(4,1).value  # Or use re module for more robust check

Note: The second for loop to iterate over columns seems not required since you are always using column=1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM