[英]Pandas create dataframe from one row
Let us say i have some dataframe df
and I want to create a new dataframe new_df
with n
rows each being the same as row idx
from df
.假设我有一些 dataframe df
并且我想创建一个新的n
new_df
,每行与df
中的行idx
相同。 Is there a faster way compared to:与以下相比,有没有更快的方法:
import pandas as pd
df = pd.DataFrame()
new_df = pd.DataFrame()
for i in range(n):
new_df.loc[i] = df.iloc[idx]
thanks谢谢
You can use repeat
:您可以使用repeat
:
N = 5
new_df = df.loc[df.index.repeat(N)]
# or for a particular row idx
new_df = df.loc[df.loc[idx].index.repeat(N)]
Or, for a new index reset_index
with drop=True
:或者,对于带有drop=True
的新索引reset_index
:
new_df = df.loc[df.index.repeat(N)].reset_index(drop=True)
# or for a particular row idx
new_df = df.loc[df.loc[idx].index.repeat(N)].reset_index(drop=True)
NB.注意。 if you have many rows in the input and only want to repeat one or some.如果您在输入中有很多行并且只想重复一个或一些。 replace df.index.repeat(N)
with df.loc[idx].index.repeat(N)
of df.loc[['idx1', 'idx2', 'idx3']].index.repeat(N)
用df.loc[['idx1', 'idx2', 'idx3']].index.repeat(N)
df.index.repeat(N)
df.loc[idx].index.repeat(N)
替换 df.index.repeat(N)
Example input:示例输入:
df = pd.DataFrame([['A', 'B', 'C']])
Output: Output:
0 1 2
0 A B C
1 A B C
2 A B C
3 A B C
4 A B C
Sample :样品:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
You can create dictionary/list by row idx
and call DataFrame constructor:您可以按行idx
创建字典/列表并调用 DataFrame 构造函数:
idx = 2
N = 10
df1 = pd.DataFrame(df.loc[idx].to_dict(), index=range(N))
df1 = pd.DataFrame([df.loc[idx].tolist()], index=range(N), columns=df.columns)
print (df1)
A B C D E
0 2 2 1 0 8
1 2 2 1 0 8
2 2 2 1 0 8
3 2 2 1 0 8
4 2 2 1 0 8
5 2 2 1 0 8
6 2 2 1 0 8
7 2 2 1 0 8
8 2 2 1 0 8
9 2 2 1 0 8
Another solution with numpy.repeat
and DataFrame.loc
, for default index use DataFrame.reset_index
with drop=True
: numpy.repeat
和DataFrame.loc
的另一种解决方案,默认索引使用DataFrame.reset_index
和drop=True
:
idx = 2
N = 10
df1 = df.loc[np.repeat(idx, N)].reset_index(drop=True)
print (df1)
A B C D E
0 2 2 1 0 8
1 2 2 1 0 8
2 2 2 1 0 8
3 2 2 1 0 8
4 2 2 1 0 8
5 2 2 1 0 8
6 2 2 1 0 8
7 2 2 1 0 8
8 2 2 1 0 8
9 2 2 1 0 8
Performance comparison (with my data, bset test in your real data):性能比较(用我的数据,在你的真实数据中进行bset测试):
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
idx = 2
N = 10000
In [260]: %timeit pd.DataFrame([df.loc[idx].tolist()], index=range(N), columns=df.columns)
690 µs ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [261]: %timeit pd.DataFrame(df.loc[idx].to_dict(), index=range(N))
786 µs ± 106 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [262]: %timeit df.loc[np.repeat(idx, N)].reset_index(drop=True)
796 µs ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
@mozway solution
In [263]: %timeit df.loc[df.index.repeat(N)].reset_index(drop=True)
3.62 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
@original solution
In [264]: %%timeit
...: nnew_df = pd.DataFrame(columns=df.columns)
...: for i in range(N):
...: new_df.loc[i] = df.iloc[idx]
...:
2.44 s ± 274 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.