在for循环中追加数据帧

Question

If I have a pd dataframe with three columns: id , start_time , end_time , and I would like to transform it into a pd.df with two columns: id , time 如果我有一个包含三列的pd数据end_time ： id ， start_time ， end_time ，我想将它转换为带有两列的pd.df： id ， time

eg from [001, 1, 3][002, 3, 4] to [001, 1][001, 2][001, 3][002, 3][002, 4] 例如，从[001, 1, 3][002, 3, 4]到[001, 1][001, 2][001, 3][002, 3][002, 4]

Currently, I am using a for loop and append the dataframe in each iteration, but it's very slow. 目前，我正在使用for循环并在每次迭代中附加数据帧，但它非常慢。 Is there any other method I can use to save time? 有没有其他方法可以用来节省时间？

Answer 1

If start_time and end_time is timedelta use: 如果start_time和end_time是timedelta使用：

df = pd.DataFrame([['001', 1, 3],['002', 3, 4]], 
                  columns=['id','start_time','end_time'])
print (df)
    id  start_time  end_time
0  001           1         3
1  002           3         4

#stack columns
df1 = pd.melt(df, id_vars='id', value_name='time').drop('variable', axis=1)
#convert int to timedelta 
df1['time'] = pd.to_timedelta(df1.time, unit='s')
df1.set_index('time', inplace=True)
print (df1)
           id
time         
00:00:01  001
00:00:03  002
00:00:03  001
00:00:04  002

#groupby by id and resample by one second
print (df1.groupby('id')
          .resample('1S')
          .ffill()
          .reset_index(drop=True, level=0)
          .reset_index())

      time   id
0 00:00:01  001
1 00:00:02  001
2 00:00:03  001
3 00:00:03  002
4 00:00:04  002

If start_time and end_time is datetime use: 如果start_time和end_time是datetime使用：

df = pd.DataFrame([['001', '2016-01-01', '2016-01-03'],
                   ['002', '2016-01-03', '2016-01-04']], 
                  columns=['id','start_time','end_time'])
print (df)
    id  start_time    end_time
0  001  2016-01-01  2016-01-03
1  002  2016-01-03  2016-01-04

df1 = pd.melt(df, id_vars='id', value_name='time').drop('variable', axis=1)
#convert to datetime
df1['time'] = pd.to_datetime(df1.time)
df1.set_index('time', inplace=True)
print (df1)
             id
time           
2016-01-01  001
2016-01-03  002
2016-01-03  001
2016-01-04  002

#groupby by id and resample by one day
print (df1.groupby('id')
          .resample('1D')
          .ffill()
          .reset_index(drop=True, level=0)
          .reset_index())

        time   id
0 2016-01-01  001
1 2016-01-02  001
2 2016-01-03  001
3 2016-01-03  002
4 2016-01-04  002

Answer 2

Here is my take on your question: 以下是我对你问题的看法：

df.set_index('id', inplace=True)

reshaped = df.apply(lambda x: pd.Series(range(x['start time'], x['end time']+1)), axis=1).\
    stack().reset_index().drop('level_1', axis=1)
reshaped.columns = ['id', 'time']
reshaped

Test 测试

Input: 输入：

import pandas as pd
from io import StringIO

data = StringIO("""id,start time,end time
001, 1, 3
002, 3, 4""")

df = pd.read_csv(data, dtype={'id':'object'})
df.set_index('id', inplace=True)
print("In\n", df)

reshaped = df.apply(lambda x: pd.Series(range(x['start time'], x['end time']+1)), axis=1).\
    stack().reset_index().drop('level_1', axis=1)
reshaped.columns = ['id', 'time']
print("Out\n", reshaped)

Output: 输出：

In
    start time  end time
id      
001 1           3
002 3           4

Out
    id  time
0   001 1
1   001 2
2   001 3
3   002 3
4   002 4

在for循环中追加数据帧

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-08-12 05:42:52

解决方案2
0 2016-08-12 07:49:15

Test 测试

在for循环中追加数据帧

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-08-12 05:42:52

解决方案2 0 2016-08-12 07:49:15

Test 测试

解决方案1
1 已采纳 2016-08-12 05:42:52

解决方案2
0 2016-08-12 07:49:15