Python Pandas - Dataframe可以有多個索引嗎？

Question

我有一個CSV格式的數據集，我讀過：

df = pd.read_csv(requestfile, header=[0,1], parse_dates= [0])

以下Dataframe采用以下格式[0..8759]：

                   time output direct diffuse temperature
                 UTC     kW  kW/m2   kW/m2       deg C
0    2014-01-01 00:00:00  0.000  0.000   0.000       1.495
1    2014-01-01 01:00:00  0.000  0.000   0.000       1.543
2    2014-01-01 02:00:00  0.000  0.000   0.000       1.517

現在我想使用https://github.com/renewables-ninja/gsee(gsee.pv.run_plant_model ）來處理它，但是我收到以下錯誤：

File "C:\Data\Solar\gsee-master\gsee\trigon.py", line 183, in aperture_irradiance
sunrise_set_times = sun_rise_set_times(direct.index, coords)

File "C:\Data\Solar\gsee-master\gsee\trigon.py", line 56, in sun_rise_set_times
dtindex = pd.DatetimeIndex(datetime_index.to_series().map(pd.Timestamp.date).unique())

File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\series.py", line 2177, in map
new_values = map_f(values, arg)

File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
TypeError: descriptor 'date' requires a 'datetime.datetime' object but received a 'int'

所以我假設錯誤在我的默認索引中，所以我修改了CSV讀取以使用'time'列作為索引：

df = pd.read_csv(requestfile, header=[0,1], index_col=0, parse_dates= [0])

time                output direct diffuse temperature
UTC                     kW  kW/m2   kW/m2       deg C
2014-01-01 00:00:00  0.000  0.000   0.000       1.495
2014-01-01 01:00:00  0.000  0.000   0.000       1.543

現在我得到的錯誤如下：

File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)

File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)

File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'

ValueError: If using all scalar values, you must pass an index

因此，如果我理解正確，第一個錯誤是因為我的索引只是INT中的數字[0..8759]，它應該是datetime格式，而我的第二個錯誤是因為我的索引是datetime-format並且

index = extract_index(arrays)

沒有原始索引[0..8759]。 或者我完全理解標量值錯誤錯誤？ 是否可以為DataFrame提供2個索引，一個[0..8759]和其他['time'] - 列？ 如何將其轉換為pd.read_csv函數或其他方法？

如果有任何幫助，我也會使用DataFrame執行以下操作（當我調用DataFrame df時，它不顯示某些初學者錯誤）（但是它們由run_plant_model函數使用）：

df.global_horizontal = df.direct + df.diffuse
df.diffuse_fraction = df.diffuse / df.global_horizontal
df.diffuse_fraction = df.diffuse_fraction.fillna(0)

編輯：我現在正確地將最新的列添加到數據幀。 它對錯誤沒有任何影響。

功能調用：

gsee.pv.run_plant_model(df, site.coords, angle, azimuth, tracking, 
                        capacity, technology, system_loss, 
                        angles=None, include_raw_data=False)

我相信最初的問題可能不好：

C:\Users\XX\Anaconda3\lib\site-packages\pandas\indexes\base.py:2683: RuntimeWarning: Cannot compare type 'Timestamp' with type 'str', sort order is undefined for incomparable objects
return this.join(other, how=how, return_indexers=return_indexers)

所以我有'str'我應該有'時間戳'？

Answer 1

好的，我發現錯誤，原來的問題很糟糕：

解：

df = pd.read_csv(requestfile, index_col=[0], parse_dates=[0], skiprows=[1])

標題被遺漏了，我添加了read_csv以跳過包含'str'單位的行。 所以問題是所使用的函數之一是試圖將'Timestamp'與單位行（'str'）進行比較。

Python Pandas - Dataframe可以有多個索引嗎？

問題描述

1 個解決方案

解決方案1
0 2017-02-06 08:16:01

Python Pandas - Dataframe可以有多個索引嗎？

問題描述

1 個解決方案

解決方案1 0 2017-02-06 08:16:01

解決方案1
0 2017-02-06 08:16:01