將從Excel讀取的數據組織到Pandas DataFrame

Question

我用這個腳本的目標是：1。從excel文件（> 100,000k行）讀取timseries數據以及標題（標簽，單位）2.convert excel數字日期到pandas dataFrame的最佳日期時間對象3.Be能夠使用時間戳引用行和系列標簽以引用列

到目前為止，我使用xlrd將excel數據讀入列表。 制作pandas系列，每個列表和使用時間列表作為索引。 組合系列與系列標題，以使python字典。 將字典傳遞給pandas DataFrame。 盡管我的努力，df.index似乎設置為列標題，我不知道何時將日期轉換為datetime對象。

我剛開始使用python 3天前所以任何建議都會很棒！ 這是我的代碼：

    #Open excel workbook and first sheet
    wb = xlrd.open_workbook("C:\GreenCSV\Calgary\CWater.xlsx")
    sh = wb.sheet_by_index(0)

    #Read rows containing labels and units
    Labels = sh.row_values(1, start_colx=0, end_colx=None)
    Units = sh.row_values(2, start_colx=0, end_colx=None)

    #Initialize list to hold data
    Data = [None] * (sh.ncols)

    #read column by column and store in list
    for colnum in range(sh.ncols):
        Data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=None)

    #Delete unecessary rows and columns
    del Labels[3],Labels[0:2], Units[3], Units[0:2], Data[3], Data[0:2]   

    #Create Pandas Series
    s = [None] * (sh.ncols - 4)
    for colnum in range(sh.ncols - 4):
        s[colnum] = Series(Data[colnum+1], index=Data[0])

    #Create Dictionary of Series
    dictionary = {}
    for i in range(sh.ncols-4):
        dictionary[i]= {Labels[i] : s[i]}

    #Pass Dictionary to Pandas DataFrame
    df = pd.DataFrame.from_dict(dictionary)

Answer 1

你可以在這里直接使用pandas，我通常喜歡創建一個DataFrames字典（鍵是表格名稱）：

In [11]: xl = pd.ExcelFile("C:\GreenCSV\Calgary\CWater.xlsx")

In [12]: xl.sheet_names  # in your example it may be different
Out[12]: [u'Sheet1', u'Sheet2', u'Sheet3']

In [13]: dfs = {sheet: xl.parse(sheet) for sheet in xl.sheet_names}

In [14]: dfs['Sheet1'] # access DataFrame by sheet name

您可以查看parse文檔中的文檔，它提供了更多選項（例如， skiprows ），這些允許您解析單個工作表以獲得更多控制權...

將從Excel讀取的數據組織到Pandas DataFrame

問題描述

1 個解決方案

解決方案1
10 已采納 2013-07-18 10:29:55

將從Excel讀取的數據組織到Pandas DataFrame

問題描述

1 個解決方案

解決方案1 10 已采納 2013-07-18 10:29:55

解決方案1
10 已采納 2013-07-18 10:29:55