[英]Python Pandas replicate rows in dataframe
If the dataframe looks like:如果数据框看起来像:
Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE
And I wanna duplicate rows with IsHoliday
equal to TRUE, I can do:我想复制IsHoliday
等于 TRUE 的行,我可以这样做:
is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)
But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.但是有没有更好的方法来做到这一点,因为我需要将假日行复制 5 次,如果使用上述方式,我必须追加 5 次。
You can put df_try
inside a list and then do what you have in mind:您可以将df_try
放入列表中,然后按照您的想法进行操作:
>>> df.append([df_try]*5,ignore_index=True)
Store Dept Date Weekly_Sales IsHoliday
0 1 1 2010-02-05 24924.50 False
1 1 1 2010-02-12 46039.49 True
2 1 1 2010-02-19 41595.55 False
3 1 1 2010-02-26 19403.54 False
4 1 1 2010-03-05 21827.90 False
5 1 1 2010-03-12 21043.39 False
6 1 1 2010-03-19 22136.64 False
7 1 1 2010-03-26 26229.21 False
8 1 1 2010-04-02 57258.43 False
9 1 1 2010-02-12 46039.49 True
10 1 1 2010-02-12 46039.49 True
11 1 1 2010-02-12 46039.49 True
12 1 1 2010-02-12 46039.49 True
13 1 1 2010-02-12 46039.49 True
Other way is using concat() function :其他方法是使用 concat() 函数:
import pandas as pd
In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))
In [604]: df
Out[604]:
col1 col2
0 a 0
1 b 1
2 c 2
In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
Out[605]:
col1 col2
0 a 0
1 b 1
2 c 2
3 a 0
4 b 1
5 c 2
6 a 0
7 b 1
8 c 2
In [606]: pd.concat([df]*3)
Out[606]:
col1 col2
0 a 0
1 b 1
2 c 2
0 a 0
1 b 1
2 c 2
0 a 0
1 b 1
2 c 2
This is an old question, but since it still comes up at the top of my results in Google, here's another way.这是一个老问题,但由于它仍然出现在我在谷歌搜索结果的顶部,这里有另一种方式。
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))
Say you want to replicate the rows where col1="b".假设您要复制 col1="b" 所在的行。
reps = [3 if val=="b" else 1 for val in df.col1]
df.loc[np.repeat(df.index.values, reps)]
You could replace the 3 if val=="b" else 1
in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.您可以将列表解释中的3 if val=="b" else 1
替换为另一个函数,该函数可以返回 3 if val=="b" 或 4 if val=="c" 等等,因此它非常灵活。
Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).在 Pandas 中附加和连接通常很慢,所以我建议只制作一个新的行列表并将其转换为数据框(除非附加单行或连接几个数据框)。
import pandas as pd
df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])
temp_df = []
for row in df.itertuples(index=False):
if row.IsHoliday:
temp_df.extend([list(row)]*5)
else:
temp_df.append(list(row))
df = pd.DataFrame(temp_df, columns=df.columns)
You can do it in one line:您可以一行完成:
df.append([df[df['IsHoliday'] == True]] * 5, ignore_index=True)
or要么
df.append([df[df['IsHoliday']]] * 5, ignore_index=True)
Another alternative to append()
is to first replace the values of a column by a list of entries and then explode()
(either using ignore_index=True
or not, depending on what you want): append()
的另一种替代方法是首先用条目列表替换列的值,然后使用explode()
(使用ignore_index=True
或不使用,取决于你想要什么):
df['IsHoliday'] = df['IsHoliday'].apply(lambda x: 5*[x] if (x == True) else x)
df.explode('IsHoliday', ignore_index=True)
The nice thing about this one is that you can already use the list in the apply()
call to build copies of rows with modified values in a column, in case you wanted to do that later anyways...这个的好处是你已经可以在apply()
调用中使用列表来构建列中具有修改值的行的副本,以防你以后想这样做......
If the dataframe looks like:如果数据框看起来像:
Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE
And I wanna duplicate rows with IsHoliday
equal to TRUE, I can do:我想复制IsHoliday
等于 TRUE 的行,我可以这样做:
is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)
But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.但是有没有更好的方法来做到这一点,因为我需要重复假日行 5 次,如果使用上述方式,我必须附加 5 次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.