简体   繁体   中英

How to read xlsx or ods columns as numpy arrays in python

Right now I am using this, but it seems to be very slow and also prints the columns as lists. Additionally, here I am manually adding columns to my list. Is there a more efficient way using numpy and reading the columns as arrays?

If not this, I was thinking of converting it to a .txt or .csv as they are easier to read. What would be the most efficient option?

Also, I have the same file in .ods and .xlsx, so using either one is fine.

import xlrd  
workbook = xlrd.open_workbook("Folds5x2_pp.xlsx","rb")
sheets = workbook.sheet_names()
print sheets
required_data = []
for sheet_name in sheets:
    sh = workbook.sheet_by_name(sheet_name)
    for rownum in range(sh.nrows):
        row_val = sh.row_values(rownum)
        required_data.append((row_val[0], row_val[1]))
print required_data

Try using openpyxl

>>> from openpyxl import load_workbook
>>> wb = load_workbook('Folds5x2_pp.xlsx', read_only=True)
>>> print wb.sheetnames
['Sheet1', 'Sheet2', 'Sheet3']
>>> ws = wb.get_sheet_by_name('Sheet1')
>>> cols = 0  # column index 
>>> x2 = np.array([r[cols].value for r in ws.iter_rows()])

or you can try pandas to_records

import pandas as pd; 
df = pd.read_excel('Folds5x2_pp.xlsx'); 
x2 = df.to_records()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM