简体   繁体   English

为列表中的值组合创建熊猫数据框行的最快方法

[英]fastest way to create pandas dataframe rows for combination of values from lists

let's say i have three list假设我有三个列表

listA = ['a','b','c', 'd']
listP = ['p', 'q', 'r']
listX = ['x', 'z']

so the dataframe will have 4*3*2 = 24 rows.所以数据帧将有 4*3*2 = 24 行。 now, the simplest way to solve this problem is to do this:现在,解决这个问题的最简单方法是这样做:

df = pd.DataFrame(columns=['A','P','X'])

for val1 in listA:
   for val2 in listP:
      for val3 in listX:
         df.loc[<indexvalue>] = [val1,val2,val3]

now in the real scenario I will have about 800k rows and 12 columns (so 12 nesting in the loops).现在在实际场景中,我将有大约 800k 行和 12 列(因此循环中有 12 个嵌套)。 is there any way i can create this dataframe much faster?有什么办法可以更快地创建这个数据框?

You could use itertools.product :您可以使用itertools.product

import pandas as pd
from itertools import product

listA = ['a', 'b', 'c', 'd']
listP = ['p', 'q', 'r']
listX = ['x', 'z']

df = pd.DataFrame(data=list(product(listA, listP, listX)), columns=['A','P','X'])
print(df.head(10))

Output输出

   A  P  X
0  a  p  x
1  a  p  z
2  a  q  x
3  a  q  z
4  a  r  x
5  a  r  z
6  b  p  x
7  b  p  z
8  b  q  x
9  b  q  z

Similar discussion here .类似的讨论在这里 Apparently np.meshgrid is more efficient for large data (as an alternative to itertools.product .显然np.meshgrid对于大数据更有效(作为itertools.product的替代品。

Application:应用:

v = np.stack(i.ravel() for i in np.meshgrid(listA, listP, listX)).T
df = pd.DataFrame(v, columns=['A', 'P', 'X'])
>>  A  P  X
0   a  p  x
1   a  p  z
2   b  p  x
3   b  p  z
4   c  p  x

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为 Pandas 数据帧中的每一行搜索和更新值的最快方法 - fastest way for searching and updating values for every rows in Pandas dataframe 向现有熊猫数据框添加行的最快方法 - Fastest way to add rows to existing pandas dataframe 从 Pandas 到 dataframe 拆分一个字符串的值; 并创建值的组合 - Split the value of a string from Pandas dataframe; and create a combination of the values 在Python中对多个列表的每个组合求和的最快方法 - Fastest way to sum values every combination of multiple lists in Python 在 pandas dataframe 中加入 coulmn 值的最快方法? - Fastest way to join coulmn values in pandas dataframe? 在Pandas中从列值创建排序列表的高效最快方法 - Efficient and fastest way in Pandas to create sorted list from column values 根据另一个 pandas Z6A8064B5DF479C550570 的值填充一个 pandas dataframe 的最快方法是什么? - What is the fastest way to populate one pandas dataframe based on values from another pandas dataframe? Pandas:创建缺少零值的组合行 - Pandas: Create missing combination rows with zero values 用另一个数据框的值替换熊猫数据框的多个值的最快方法 - Fastest way to replace multiple values of a pandas dataframe with values from another dataframe 获取 Pandas Dataframe 中每个特征的值不正确的行百分比的最快方法 - Fastest way to get Percent of rows with incorrect values for each feature in a Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM