Pandas：使用iterrows（）和pd.Series将值附加到系列

Question

My input data looks like this: 我的输入数据如下所示：

   cat  start               target
0   1   2016-09-01 00:00:00 4.370279
1   1   2016-09-01 00:00:00 1.367778
2   1   2016-09-01 00:00:00 0.385834

I want to build out a series using "start" for the Start Date and "target" for the series values. 我想构建一个系列，使用“开始”作为开始日期，使用“目标”作为系列值。 The iterrows() is pulling the correct values for "imp", but when appending to the time_series, only the first value is carried through to all series points. iterrows（）正在为“imp”提取正确的值，但是当附加到time_series时，只有第一个值被传递到所有系列点。 What's the reason for "data=imp" pulling the 0th row every time? “data = imp”每次拉第0行的原因是什么？

t0 = model_input_test['start'][0] # t0 = 2016-09-01 00:00:00
num_ts = len(model_input_test.index) # num_ts = 1348
time_series = []
for i, row in model_input_test.iterrows():
    imp = row.loc['target']
    print(imp)
    index = pd.DatetimeIndex(start=t0, freq='H', periods=num_ts)
    time_series.append(pd.Series(data=imp, index=index))

A screenshot can be seen here . 可以在此处看到屏幕截图。

Series "time_series" should look like this: 系列“time_series”应如下所示：

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834

But ends up looking like this: 但最终看起来像这样：

2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    4.370279
2016-09-01 02:00:00    4.370279

I'm using Jupyter conda_python3 on Sagemaker. 我在Sagemaker上使用Jupyter conda_python3。

Answer 1

When using dataframes, there are usually better ways to go about tasks then iterating through the dataframe. 使用数据帧时，通常有更好的方法来执行任务，然后遍历数据帧。 For example, in your case, you can create your series like this: 例如，在您的情况下，您可以像这样创建系列：

time_series = (df.set_index(pd.date_range(pd.to_datetime(df.start).iloc[0],
                                        periods = len(df), freq='H')))['target']


>>> time_series
2016-09-01 00:00:00    4.370279
2016-09-01 01:00:00    1.367778
2016-09-01 02:00:00    0.385834
Freq: H, Name: target, dtype: float64
>>> type(time_series)
<class 'pandas.core.series.Series'>

Essentially, this says: "set the index to be a date range incremented hourly from your first date, then take the target column" 基本上，这表示：“将索引设置为从第一个日期开始每小时递增一个日期范围，然后获取target列”

Answer 2

给定数据帧df和系列start和target ，您只需使用set_index ：

time_series = df.set_index('start')['target']

Pandas：使用iterrows（）和pd.Series将值附加到系列

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-07-23 17:33:18

解决方案2
0 2018-07-23 17:37:01

Pandas：使用iterrows（）和pd.Series将值附加到系列

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-07-23 17:33:18

解决方案2 0 2018-07-23 17:37:01

解决方案1
1 已采纳 2018-07-23 17:33:18

解决方案2
0 2018-07-23 17:37:01