[英]Python Pandas - Can Dataframe have multiple indexes?
I have a dataset in CSV which I read with: 我有一个CSV格式的数据集,我读过:
df = pd.read_csv(requestfile, header=[0,1], parse_dates= [0])
The following Dataframe is in following format [0..8759]: 以下Dataframe采用以下格式[0..8759]:
time output direct diffuse temperature
UTC kW kW/m2 kW/m2 deg C
0 2014-01-01 00:00:00 0.000 0.000 0.000 1.495
1 2014-01-01 01:00:00 0.000 0.000 0.000 1.543
2 2014-01-01 02:00:00 0.000 0.000 0.000 1.517
Now I want do things with it using https://github.com/renewables-ninja/gsee (gsee.pv.run_plant_model), however I receive the following error: 现在我想使用https://github.com/renewables-ninja/gsee(gsee.pv.run_plant_model )来处理它,但是我收到以下错误:
File "C:\Data\Solar\gsee-master\gsee\trigon.py", line 183, in aperture_irradiance
sunrise_set_times = sun_rise_set_times(direct.index, coords)
File "C:\Data\Solar\gsee-master\gsee\trigon.py", line 56, in sun_rise_set_times
dtindex = pd.DatetimeIndex(datetime_index.to_series().map(pd.Timestamp.date).unique())
File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\series.py", line 2177, in map
new_values = map_f(values, arg)
File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
TypeError: descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
So I assumed the fault is in my default index, so I modified the CSV-reading to use the 'time' column as index: 所以我假设错误在我的默认索引中,所以我修改了CSV读取以使用'time'列作为索引:
df = pd.read_csv(requestfile, header=[0,1], index_col=0, parse_dates= [0])
time output direct diffuse temperature
UTC kW kW/m2 kW/m2 deg C
2014-01-01 00:00:00 0.000 0.000 0.000 1.495
2014-01-01 01:00:00 0.000 0.000 0.000 1.543
Now the error I get is following: 现在我得到的错误如下:
File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5398, in _arrays_to_mgr
index = extract_index(arrays)
File "C:\Users\XX\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5437, in extract_index
raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index
So if I understood correctly, the first error is because my index is just numbers [0..8759] in INT when it should be in datetime-format, and my second error is because my index is in datetime-format and 因此,如果我理解正确,第一个错误是因为我的索引只是INT中的数字[0..8759],它应该是datetime格式,而我的第二个错误是因为我的索引是datetime-format并且
index = extract_index(arrays)
doesn't have the orginal index [0..8759]. 没有原始索引[0..8759]。 Or have I completely understood the scalar value error wrong?
或者我完全理解标量值错误错误? Would it be possible to have 2 indexes for the DataFrame, one [0..8759] and other ['time']-column?
是否可以为DataFrame提供2个索引,一个[0..8759]和其他['time'] - 列? How would this be translated to pd.read_csv function or by other method?
如何将其转换为pd.read_csv函数或其他方法?
If it is any help, I also do the following with the DataFrame (which don't show for some beginner mistake when I call the DataFrame df) (but they are used by the run_plant_model function and) : 如果有任何帮助,我也会使用DataFrame执行以下操作(当我调用DataFrame df时,它不显示某些初学者错误)(但是它们由run_plant_model函数使用):
df.global_horizontal = df.direct + df.diffuse
df.diffuse_fraction = df.diffuse / df.global_horizontal
df.diffuse_fraction = df.diffuse_fraction.fillna(0)
EDIT: I now properly added the latest columns to the dataframe. 编辑:我现在正确地将最新的列添加到数据帧。 It did not have any effect on the error.
它对错误没有任何影响。
Function call: 功能调用:
gsee.pv.run_plant_model(df, site.coords, angle, azimuth, tracking,
capacity, technology, system_loss,
angles=None, include_raw_data=False)
I believe the original question might have been bad: 我相信最初的问题可能不好:
C:\Users\XX\Anaconda3\lib\site-packages\pandas\indexes\base.py:2683: RuntimeWarning: Cannot compare type 'Timestamp' with type 'str', sort order is undefined for incomparable objects
return this.join(other, how=how, return_indexers=return_indexers)
So I have 'str' where I should have 'Timestamp'? 所以我有'str'我应该有'时间戳'?
Ok, I found the error and the original question was bad: 好的,我发现错误,原来的问题很糟糕:
Solution: 解:
df = pd.read_csv(requestfile, index_col=[0], parse_dates=[0], skiprows=[1])
Headers were left out, and I added the read_csv to skip the row containing units in 'str'. 标题被遗漏了,我添加了read_csv以跳过包含'str'单位的行。 So the problem was one of the functions used was trying to compare 'Timestamp' with the unit row ('str').
所以问题是所使用的函数之一是试图将'Timestamp'与单位行('str')进行比较。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.