在 Pandas 中，read_excel() 中使用的 read_csv() 中的“nrows”相当于什么？

Question

Want to import only certain range of data from an excel spreadsheet (.xlsm format as it has macros) into a pandas dataframe.只想将特定范围的数据从 excel 电子表格（.xlsm 格式，因为它具有宏）导入到 Pandas 数据框中。 Was doing it this way:是这样做的：

data    = pd.read_excel(filepath, header=0,  skiprows=4, nrows= 20, parse_cols = "A:D")

But it seems that nrows works only with read_csv() ?但似乎 nrows 仅适用于 read_csv() ？ What would be the equivalent for read_excel()? read_excel() 的等价物是什么？

Answer 1

If you know the number of rows in your Excel sheet, you can use the skip_footer parameter to read the first n - skip_footer rows of your file, where n is the total number of rows.如果您知道 Excel 工作表中的行数，则可以使用skip_footer参数读取文件的前n - skip_footer行，其中n是总行数。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Usage:用法：

data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80)

Assuming your excel sheet has 100 rows, this line would parse the first 20 rows.假设您的 excel 表有 100 行，此行将解析前 20 行。

Answer 2

As noted in the documentation , as of pandas version 0.23, this is now a built-in option, and functions almost exactly as the OP stated.如文档中所述，从 pandas 版本 0.23 开始，这现在是一个内置选项，其功能几乎与 OP 所述完全相同。

The code编码

data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")

will now read the excel file, take data from the first sheet (default), skip 4 rows of data, then take the first line (ie, the fifth line of the sheet) as the header, read the next 20 rows of data into the dataframe (lines 6-25), and only use the columns A:D.现在将读取excel文件，从第一张工作表中取数据（默认），跳过4行数据，然后以第一行（即工作表的第五行）为标题，将接下来的20行数据读入数据框（第 6-25 行），并且仅使用 A:D 列。 Note that use_cols is now the final option, as parse_cols is deprecated.请注意， use_cols 现在是最后一个选项，因为 parse_cols 已被弃用。

Answer 3

I'd like to make (extend) @Erol's answer bit more flexible.我想让（扩展） @Erol 的回答更灵活一些。

Assuming that we DON'T know the total number of rows in the excel sheet:假设我们不知道excel表中的总行数：

xl = pd.ExcelFile(filepath)

# parsing first (index: 0) sheet
total_rows = xl.book.sheet_by_index(0).nrows

skiprows = 4
nrows = 20

# calc number of footer rows
# (-1) - for the header row
skipfooter = total_rows - nrows - skiprows - 1

df = xl.parse(0, skiprows=skiprows, skipfooter=skipfooter, parse_cols="A:D") \
       .dropna(axis=1, how='all')

.dropna(axis=1, how='all') will drop all columns containing only NaN 's .dropna(axis=1, how='all')将删除所有只包含NaN的列

在 Pandas 中，read_excel() 中使用的 read_csv() 中的“nrows”相当于什么？

问题描述

3 个解决方案

解决方案1
13 已采纳 2016-03-02 13:27:40

解决方案2
9 2018-06-28 18:49:04

解决方案3
6 2017-04-12 12:39:52

在 Pandas 中，read_excel() 中使用的 read_csv() 中的“nrows”相当于什么？

问题描述

3 个解决方案

解决方案1 13 已采纳 2016-03-02 13:27:40

解决方案2 9 2018-06-28 18:49:04

解决方案3 6 2017-04-12 12:39:52

解决方案1
13 已采纳 2016-03-02 13:27:40

解决方案2
9 2018-06-28 18:49:04

解决方案3
6 2017-04-12 12:39:52