帶有 usecols 列表列表的 Read_Excel

Question

我必須閱讀幾個包含固定 5 列的 Excel 文件（來自 url 鏈接）。 但是，列的名稱可能會略有不同，即（'foo','fo','f(0)') 因為 people.

有一種方法可以傳遞包含列表的列表， like [['foo','fo','f(0)'],['foo2','f02','f(o)2'],...]對於 usecols？

現在我使用這段代碼：

links = df['column_I_need'].str.join(sep='')
col_names = ['foo','fo','f(0)']
for i in links:
    try:
        name = i[50:]
        df = pd.read_excel(i, header = 1, names = col_names, encoding = 'utf-8') #usecols = names)
        file_name = r"%s\%s" %(pasta_sol,name)

        writer = pd.ExcelWriter(file_name , engine='xlsxwriter')
        df.to_excel(writer, header = True, index = True)
        writer.close()

    except (TypeError, IndexError, ValueError, XLRDError, BadZipFile, urllib.error.URLError) as e:
        erros.append((i, e.args[0]))

每個文件中每一列的信息是針對系統中的特定字段的。

我真的什么都找不到。 在大多數文件中，單元格中的值是正確的，但人們會更改列名。

如果有人有任何想法，我將不勝感激。

謝謝

Answer 1

這是我在上一個角色中使用的 function 的粗略版本（我沒有達到 Git 並且沒有版本控制/保存我所有的東西）

這將遍歷您選擇的目錄並返回匹配 excel 文件及其路徑和列的列表。

當返回字典時，您可以遍歷文件路徑並將該值用作 usecols 參數。

for path,column in return_value.items():
    df = pd.read_excel(path,usecols=column)

在行動。

return_value = find_common_column(r"C:\Users\datanovice\Documents\Python Scripts\Test"
,sheetname='Sheet1'
,col_list=['dat','test'])

print(return_value)

{WindowsPath('C:/Users/datanovice/Documents/Python Scripts/Test/doc_1.xlsx'): Index(['data'], dtype='object')}

模塊

import pandas as pd 
import numpy 
from pathlib import Path
from xlrd import XLRDError

Function

def find_common_column(path,sheetname,col_list=list):
    """
    Takes in three arguments and returns a 
    dictionary of paths and common columns

    Path : Path to your excel files.  
    sheetname : the sheet we will use. 
    collist : columns you need to parse from each sheet.
    """
    excel_dict = {f : pd.ExcelFile(f) for f in Path(path).glob('*.xlsx')}

    pat = '|'.join(col_list)

    dfs = {}
    for filename,each_excel in excel_dict.items():
        try:
            df = pd.read_excel(each_excel,sheet_name=sheetname,nrows=1)
            cols = df.filter(regex=pat,axis=1).columns
            dfs[filename] = cols

        except XLRDError as err:
            pass
    return dfs

帶有 usecols 列表列表的 Read_Excel

問題描述

1 個解決方案

解決方案1
1 已采納 2020-05-15 21:38:24

在行動。

模塊

Function

帶有 usecols 列表列表的 Read_Excel

問題描述

1 個解決方案

解決方案1 1 已采納 2020-05-15 21:38:24

在行動。

模塊

Function

解決方案1
1 已采納 2020-05-15 21:38:24