[英]fastest way to create pandas dataframe rows for combination of values from lists
let's say i have three list假设我有三个列表
listA = ['a','b','c', 'd']
listP = ['p', 'q', 'r']
listX = ['x', 'z']
so the dataframe will have 4*3*2 = 24 rows.所以数据帧将有 4*3*2 = 24 行。 now, the simplest way to solve this problem is to do this:
现在,解决这个问题的最简单方法是这样做:
df = pd.DataFrame(columns=['A','P','X'])
for val1 in listA:
for val2 in listP:
for val3 in listX:
df.loc[<indexvalue>] = [val1,val2,val3]
now in the real scenario I will have about 800k rows and 12 columns (so 12 nesting in the loops).现在在实际场景中,我将有大约 800k 行和 12 列(因此循环中有 12 个嵌套)。 is there any way i can create this dataframe much faster?
有什么办法可以更快地创建这个数据框?
You could use itertools.product :您可以使用itertools.product :
import pandas as pd
from itertools import product
listA = ['a', 'b', 'c', 'd']
listP = ['p', 'q', 'r']
listX = ['x', 'z']
df = pd.DataFrame(data=list(product(listA, listP, listX)), columns=['A','P','X'])
print(df.head(10))
Output输出
A P X
0 a p x
1 a p z
2 a q x
3 a q z
4 a r x
5 a r z
6 b p x
7 b p z
8 b q x
9 b q z
Similar discussion here .类似的讨论在这里。 Apparently
np.meshgrid
is more efficient for large data (as an alternative to itertools.product
.显然
np.meshgrid
对于大数据更有效(作为itertools.product
的替代品。
Application:应用:
v = np.stack(i.ravel() for i in np.meshgrid(listA, listP, listX)).T
df = pd.DataFrame(v, columns=['A', 'P', 'X'])
>> A P X
0 a p x
1 a p z
2 b p x
3 b p z
4 c p x
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.