简体   繁体   English

如何读取一个excel文件并将内容转换为python中的列表列表?

[英]how to read an excel file and convert the content to a list of lists in python?

I have this data in an excel file (each line in a cell):我在 excel 文件中有这些数据(单元格中的每一行):

#module 0 size: 9 bs: 2.27735e-08 
1 35 62 93 116 167 173 176 182 
#module 1 size: 5 bs: 0.00393944 
2 11 29 128 130 
#module 2 size: 13 bs: 1.00282e-07 
8 19 20 25 26 58 67 132 150 153 185 187 188 

I want to read the data from the excel file and make a list of lists out of the even lines.我想从 excel 文件中读取数据,并从偶数行中列出列表。
desired output:所需 output:

[[1,35,62,93,116,167,173,176,182],
[2,11,29,128,130],
[8,19,20,25,26,58,67,132,150,153,185,187,188]]

Look into OpenPyXL , I use it often to work with complex workbooks at my job.查看OpenPyXL ,我经常在工作中使用它来处理复杂的工作簿。 Once imported, rows in the workbook can be appended to lists like so:导入后,工作簿中的行可以附加到列表中,如下所示:

for row in worksheet.rows:
    rowValuesList.append(row)

Each cell being it's own value in the list.每个单元格在列表中都是它自己的值。 Then you could append rowValuesList to a master list to create your list of lists.然后你可以 append rowValuesList 到一个主列表来创建你的列表列表。

You could try this using xlrd instead of pandas :您可以尝试使用xlrd而不是pandas

import xlrd

workbook = xlrd.open_workbook(r'Book1.xlsx')

ls = [str(workbook.sheet_by_index(0).cell_value(i,0)) for i in range(workbook.sheet_by_index(0).nrows) if not 'module' in str(workbook.sheet_by_index(0).cell_value(i,0))]
ls=[list(map(int,i.split(' '))) for i in ls]
print(ls)

Output: Output:

[[1, 35, 62, 93, 116, 167, 173, 176, 182], [2, 11, 29, 128, 130], [8, 19, 20, 25, 26, 58, 67, 132, 150, 153, 185, 187, 188]]

The Library 'xlrd' is perfect for manipulating excel files. “xlrd”库非常适合处理 excel 个文件。

import xlrd

def main():
    # Path to excel file
    file_path = ('PATH_TO_FILE')

    # Import complete excel workbook
    excel_workbook = xlrd.open_workbook(file_path)
    # Import specific sheet by index
    excel_sheet = excel_workbook.sheet_by_index(0)

    # Create array for each row
    relevantData = []
    # Loop through each row of excel sheet 
    for row in range(excel_sheet.nrows): #nrows returns number of rows
        # If even
        if row % 2 != 0:
            # Convert row to array and append to relevantData array
            relevantData.append(rowToArray(row))

    print(relevantData)

def rowToArray(row):
    """
        excel_sheet.cell_value(row,0) -> Get the data in the row defined
        .split()      -> returns list of string, spilt at the white spaces, 
        map(int, <>)  -> map all values in list to integers
        lits(map(<>)) -> reconverts result into a list
    """
    return list(map(int, excel_sheet.cell_value(row,0).split()))


main()

Output: Output:

[[1, 35, 62, 93, 116, 167, 173, 176, 182], [2, 11, 29, 128, 130], [8, 19, 20, 25, 26, 58, 67, 132, 150, 153, 185, 187, 188]]

Try importing into pandas, remove lines with contains the string "Module" and then split the values尝试导入 pandas,删除包含字符串“Module”的行,然后拆分值

EDIT : forgot to get the list part.编辑:忘记获取列表部分。

import pandas as pd

# if its an csv you can define sep=' '. Change to your file location.
df = pd.read_excel(".//book.xlsx", header=None)
# name the columns to filter results
df.columns = ['temp']
# search for rows where "module" exists in string and get the opposite (the ~ before)
df = df[~df['temp'].str.contains('module')].reset_index()
# split() the values of the column expanding into new ones
df = df['temp'].str.split(" ", expand=True)
# transform into list
list_values = df.values.tolist()
# Filter Nones
filtered_list = [list(filter(None, l)) for l in list_values]
print(filtered_list)
# >>> [['1', '35', '62', '93', '116', '167', '173', '176', '182'],
# >>> ['2', '11', '29', '128', '130'],
# >>> ['8', '19', '20', '25', '26', '58', '67', '132', '150', '153', '185', '187', '188']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM