简体   繁体   English

在 python 中读取.xlsx 格式

[英]Reading .xlsx format in python

I've got to read.xlsx file every 10min in python.我必须在 python 中每 10 分钟读取一次.xlsx 文件。
What is the most efficient way to do this?最有效的方法是什么?
I've tried using xlrd, but it doesn't read.xlsx - according to documentation he does, but I can't do this - getting Unsupported format, or corrupt file exceptions.我试过使用 xlrd,但它不读取.xlsx - 根据他所做的文档,但我不能这样做 - 获取Unsupported format, or corrupt file异常。
What is the best way to read xlsx?阅读 xlsx 的最佳方法是什么?
I need to read comments in cells too.我也需要阅读单元格中的评论。

xlrd hasn't released the version yet to read xlsx. xlrd 尚未发布读取 xlsx 的版本。 Until then, Eric Gazoni built a package called openpyxl - reads xlsx files, and does limited writing of them.在那之前, Eric Gazoni构建了一个名为 openpyxl 的package - 读取 xlsx 文件,并对其进行有限的写入。

Use Openpyxl some basic examples:使用 Openpyxl 的一些基本示例:

import openpyxl

# Open Workbook
wb = openpyxl.load_workbook(filename='example.xlsx', data_only=True)

# Get All Sheets
a_sheet_names = wb.get_sheet_names()
print(a_sheet_names)

# Get Sheet Object by names
o_sheet = wb.get_sheet_by_name("Sheet1")
print(o_sheet)

# Get Cell Values
o_cell = o_sheet['A1']
print(o_cell.value)

o_cell = o_sheet.cell(row=2, column=1)
print(o_cell.value)

o_cell = o_sheet['H1']
print(o_cell.value)

# Sheet Maximum filled Rows and columns
print(o_sheet.max_row)
print(o_sheet.max_column)

There are multiple ways to read XLSX formatted files using python .多种方法可以使用 python 读取 XLSX 格式的文件 Two are illustrated below and require that you install openpyxl at least and if you want to parse into pandas directly you want to install pandas, eg.下面说明了两个,要求您至少安装 openpyxl,如果您想直接解析为 pandas,您需要安装 pandas,例如。 pip install pandas openpyxl

Option 1: pandas direct选项 1:pandas 直接

Primary use case: load just the data for further processing.主要用例:仅加载数据以进行进一步处理。

Using read_excel() function in pandas would be your best choice.在 pandas 中使用read_excel() function 将是您的最佳选择。 Note that pandas should fall back to openpyxl automatically but in the event of format issues its best to specify the engine directly.请注意,pandas 应该自动回退到 openpyxl,但如果出现格式问题,最好直接指定引擎。

df_pd = pd.read_excel("path/file_name.xlsx", engine="openpyxl")

Option 2 - openpyxl direct选项 2 - openpyxl 直接

Primary use case: getting or editing specific Excel document elements such as comments (requested by OP), formatting properties or formulas.主要用例:获取或编辑特定的 Excel 文档元素,例如注释(由 OP 请求)、格式属性或公式。

Using load_workbook() followed by comment extraction using the comment attribute for each cell would be achieved by the following.使用load_workbook()然后使用每个单元格的注释属性提取注释将通过以下方式实现。

from openpyxl import load_workbook
wb = load_workbook(filename = "path/file_name.xlsx")
ws = wb.active
ws["A1"].comment # <- loop through row & columns to extract all comments

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM