[英]how to read an excel file and convert the content to a list of lists in python?
I have this data in an excel file (each line in a cell):我在 excel 文件中有这些数据(单元格中的每一行):
#module 0 size: 9 bs: 2.27735e-08
1 35 62 93 116 167 173 176 182
#module 1 size: 5 bs: 0.00393944
2 11 29 128 130
#module 2 size: 13 bs: 1.00282e-07
8 19 20 25 26 58 67 132 150 153 185 187 188
I want to read the data from the excel file and make a list of lists out of the even lines.我想从 excel 文件中读取数据,并从偶数行中列出列表。
desired output:所需 output:
[[1,35,62,93,116,167,173,176,182],
[2,11,29,128,130],
[8,19,20,25,26,58,67,132,150,153,185,187,188]]
Look into OpenPyXL , I use it often to work with complex workbooks at my job.查看OpenPyXL ,我经常在工作中使用它来处理复杂的工作簿。 Once imported, rows in the workbook can be appended to lists like so:
导入后,工作簿中的行可以附加到列表中,如下所示:
for row in worksheet.rows:
rowValuesList.append(row)
Each cell being it's own value in the list.每个单元格在列表中都是它自己的值。 Then you could append rowValuesList to a master list to create your list of lists.
然后你可以 append rowValuesList 到一个主列表来创建你的列表列表。
You could try this using xlrd
instead of pandas
:您可以尝试使用
xlrd
而不是pandas
:
import xlrd
workbook = xlrd.open_workbook(r'Book1.xlsx')
ls = [str(workbook.sheet_by_index(0).cell_value(i,0)) for i in range(workbook.sheet_by_index(0).nrows) if not 'module' in str(workbook.sheet_by_index(0).cell_value(i,0))]
ls=[list(map(int,i.split(' '))) for i in ls]
print(ls)
Output: Output:
[[1, 35, 62, 93, 116, 167, 173, 176, 182], [2, 11, 29, 128, 130], [8, 19, 20, 25, 26, 58, 67, 132, 150, 153, 185, 187, 188]]
The Library 'xlrd' is perfect for manipulating excel files. “xlrd”库非常适合处理 excel 个文件。
import xlrd
def main():
# Path to excel file
file_path = ('PATH_TO_FILE')
# Import complete excel workbook
excel_workbook = xlrd.open_workbook(file_path)
# Import specific sheet by index
excel_sheet = excel_workbook.sheet_by_index(0)
# Create array for each row
relevantData = []
# Loop through each row of excel sheet
for row in range(excel_sheet.nrows): #nrows returns number of rows
# If even
if row % 2 != 0:
# Convert row to array and append to relevantData array
relevantData.append(rowToArray(row))
print(relevantData)
def rowToArray(row):
"""
excel_sheet.cell_value(row,0) -> Get the data in the row defined
.split() -> returns list of string, spilt at the white spaces,
map(int, <>) -> map all values in list to integers
lits(map(<>)) -> reconverts result into a list
"""
return list(map(int, excel_sheet.cell_value(row,0).split()))
main()
Output: Output:
[[1, 35, 62, 93, 116, 167, 173, 176, 182], [2, 11, 29, 128, 130], [8, 19, 20, 25, 26, 58, 67, 132, 150, 153, 185, 187, 188]]
Try importing into pandas, remove lines with contains the string "Module" and then split the values尝试导入 pandas,删除包含字符串“Module”的行,然后拆分值
EDIT : forgot to get the list part.编辑:忘记获取列表部分。
import pandas as pd
# if its an csv you can define sep=' '. Change to your file location.
df = pd.read_excel(".//book.xlsx", header=None)
# name the columns to filter results
df.columns = ['temp']
# search for rows where "module" exists in string and get the opposite (the ~ before)
df = df[~df['temp'].str.contains('module')].reset_index()
# split() the values of the column expanding into new ones
df = df['temp'].str.split(" ", expand=True)
# transform into list
list_values = df.values.tolist()
# Filter Nones
filtered_list = [list(filter(None, l)) for l in list_values]
print(filtered_list)
# >>> [['1', '35', '62', '93', '116', '167', '173', '176', '182'],
# >>> ['2', '11', '29', '128', '130'],
# >>> ['8', '19', '20', '25', '26', '58', '67', '132', '150', '153', '185', '187', '188']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.