使用熊貓和多個索引在python中讀取Excel文件

Question

我是python新手，所以請原諒這個基本問題。 我的.xlsx文件看起來像這樣

Unnamend:1    A     Unnamend:2    B
2015-01-01    10    2015-01-01    10
2015-01-02    20    2015-01-01    20
2015-01-03    30    NaT           NaN

當我使用pandas.read_excel（...）在Python中閱讀時，pandas自動使用第一列作為時間索引。

是否有一個單線告訴大熊貓注意，每隔一列都是緊挨着它的時間序列的時間索引？

所需的輸出如下所示：

date          A     B
2015-01-01    10    10
2015-01-02    20    20
2015-01-03    30    NaN

Answer 1

為了解析相鄰columns塊並在它們各自的datetime索引上對齊，您可以執行以下操作：

以df ：

Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
Unnamed: 0    3 non-null datetime64[ns]
A             3 non-null int64
Unnamed: 1    2 non-null datetime64[ns]
B             2 non-null float64
dtypes: datetime64[ns](2), float64(1), int64(1)

您可以遍歷2列的塊並像這樣merge index ：

def chunks(l, n):
    """ Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

merged = df.loc[:, list(df)[:2]].set_index(list(df)[0])
for cols in chunks(list(df)[2:], 2):
    merged = merged.merge(df.loc[:, cols].set_index(cols[0]).dropna(), left_index=True, right_index=True, how='outer')

要得到：

             A   B
2015-01-01  10  10
2015-01-01  10  20
2015-01-02  20 NaN
2015-01-03  30 NaN

不幸的是pd.concat無法正常工作，因為它無法處理重復的index條目，否則可能會使用list comprehension pd.concat ：

pd.concat([df.loc[:, cols].set_index(cols[0]) for cols in chunks(list(df), 2)], axis=1)

Answer 2

使用熊貓顯示后，我使用xlrd導入數據

import xlrd
import pandas as pd
workbook = xlrd.open_workbook(xls_name)
workbook = xlrd.open_workbook(xls_name, encoding_override="cp1252")
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
data =[]
for row in range(10, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
          elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)

first_column=second_column=third_column=[]
for elm in data :
    first_column.append(elm(first_row[0]))
    second_column.append(elm(first_row[1]))
    third_column.append(elm(first_row[2]))

dict1={}
dict1[first_row[0]]=first_column
dict1[first_row[1]]=second_column
dict1[first_row[2]]=third_column
res=pd.DataFrame(dict1, columns=['column1', 'column2', 'column3'])
print res

使用熊貓和多個索引在python中讀取Excel文件

問題描述

2 個解決方案

解決方案1
1 2016-02-01 17:44:37

解決方案2
0 2016-02-02 14:58:48

使用熊貓和多個索引在python中讀取Excel文件

問題描述

2 個解決方案

解決方案1 1 2016-02-01 17:44:37

解決方案2 0 2016-02-02 14:58:48

解決方案1
1 2016-02-01 17:44:37

解決方案2
0 2016-02-02 14:58:48