Pandas 从一行创建 dataframe

Question

Let us say i have some dataframe df and I want to create a new dataframe new_df with n rows each being the same as row idx from df .假设我有一些 dataframe df并且我想创建一个新的n new_df ，每行与df中的行idx相同。 Is there a faster way compared to:与以下相比，有没有更快的方法：

import pandas as pd

df = pd.DataFrame()
new_df = pd.DataFrame()


for i in range(n):
    new_df.loc[i] = df.iloc[idx]

thanks谢谢

Answer 1

You can use repeat :您可以使用repeat ：

N = 5
new_df = df.loc[df.index.repeat(N)]
# or for a particular row idx
new_df = df.loc[df.loc[idx].index.repeat(N)]

Or, for a new index reset_index with drop=True :或者，对于带有drop=True的新索引reset_index ：

new_df = df.loc[df.index.repeat(N)].reset_index(drop=True)
# or for a particular row idx
new_df = df.loc[df.loc[idx].index.repeat(N)].reset_index(drop=True)

NB.注意。 if you have many rows in the input and only want to repeat one or some.如果您在输入中有很多行并且只想重复一个或一些。 replace df.index.repeat(N) with df.loc[idx].index.repeat(N) of df.loc[['idx1', 'idx2', 'idx3']].index.repeat(N)用df.loc[['idx1', 'idx2', 'idx3']].index.repeat(N) df.index.repeat(N) df.loc[idx].index.repeat(N)替换 df.index.repeat(N)

Example input:示例输入：

df = pd.DataFrame([['A', 'B', 'C']])

Output: Output：

   0  1  2
0  A  B  C
1  A  B  C
2  A  B  C
3  A  B  C
4  A  B  C

Answer 2

Sample :样品：

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
   A  B  C  D  E
0  8  8  3  7  7
1  0  4  2  5  2
2  2  2  1  0  8
3  4  0  9  6  2
4  4  1  5  3  4

You can create dictionary/list by row idx and call DataFrame constructor:您可以按行idx创建字典/列表并调用 DataFrame 构造函数：

idx = 2
N = 10
df1 = pd.DataFrame(df.loc[idx].to_dict(), index=range(N))
df1 = pd.DataFrame([df.loc[idx].tolist()], index=range(N), columns=df.columns)
print (df1)
   A  B  C  D  E
0  2  2  1  0  8
1  2  2  1  0  8
2  2  2  1  0  8
3  2  2  1  0  8
4  2  2  1  0  8
5  2  2  1  0  8
6  2  2  1  0  8
7  2  2  1  0  8
8  2  2  1  0  8
9  2  2  1  0  8

Another solution with numpy.repeat and DataFrame.loc , for default index use DataFrame.reset_index with drop=True : numpy.repeat和DataFrame.loc的另一种解决方案，默认索引使用DataFrame.reset_index和drop=True ：

idx = 2
N = 10
df1 = df.loc[np.repeat(idx, N)].reset_index(drop=True)
print (df1)
   A  B  C  D  E
0  2  2  1  0  8
1  2  2  1  0  8
2  2  2  1  0  8
3  2  2  1  0  8
4  2  2  1  0  8
5  2  2  1  0  8
6  2  2  1  0  8
7  2  2  1  0  8
8  2  2  1  0  8
9  2  2  1  0  8

Performance comparison (with my data, bset test in your real data):性能比较（用我的数据，在你的真实数据中进行bset测试）：

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)

idx = 2
N = 10000

In [260]: %timeit pd.DataFrame([df.loc[idx].tolist()], index=range(N), columns=df.columns)
690 µs ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [261]: %timeit pd.DataFrame(df.loc[idx].to_dict(), index=range(N))
786 µs ± 106 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [262]: %timeit df.loc[np.repeat(idx, N)].reset_index(drop=True)
796 µs ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@mozway solution
In [263]: %timeit df.loc[df.index.repeat(N)].reset_index(drop=True)
3.62 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@original solution
In [264]: %%timeit
     ...: nnew_df = pd.DataFrame(columns=df.columns)
     ...: for i in range(N):
     ...:     new_df.loc[i] = df.iloc[idx]
     ...:     
2.44 s ± 274 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Pandas 从一行创建 dataframe

问题描述

2 个解决方案

解决方案1
0 2022-09-21 12:11:01

解决方案2
0 2022-09-21 12:11:59

Pandas 从一行创建 dataframe

问题描述

2 个解决方案

解决方案1 0 2022-09-21 12:11:01

解决方案2 0 2022-09-21 12:11:59

解决方案1
0 2022-09-21 12:11:01

解决方案2
0 2022-09-21 12:11:59