[英]More efficient way to assign group for each row in pandas
I have a data frame with more than 1000 columns, and I have a predefined group list.我有一个包含 1000 多列的数据框,并且我有一个预定义的组列表。 I would like to compare each cell value with the each group boundary and create a new column to assign the group name.
我想将每个单元格值与每个组边界进行比较,并创建一个新列来分配组名。 I have written
for loops
but it took more than 5 mins to process it.我已经写了
for loops
但处理它花了超过 5 分钟。 Is there any more efficient way to achieve this?有没有更有效的方法来实现这一目标? Thanks
谢谢
Here my data frame这是我的数据框
Frequency
21.0
18.0
16.0
10.0
10.0
9.0
10.0
10.0
5.0
8.0
And my predefined group list还有我预定义的组列表
> groups
[(3, 5), (6, 10), (11, 30)]
What I would like to get is我想得到的是
Frequency Group
21.0 11-30
18.0 11-30
16.0 11-30
10.0 6-10
10.0 6-10
9.0 6-10
10.0 6-10
10.0 6-10
5.0 3-5
8.0 6-10
Here is my code这是我的代码
for i in range(0, len(fre_table["Frequency"])):
for j in range(0, len(groups)):
if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
break
fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])
Establishing the efficiency of the solution suggested by @BallpointBen in the comment section在评论部分建立@BallpointBen 建议的解决方案的效率
Data:数据:
import numpy as np
import pandas as pd
fre_table = pd.DataFrame({'Index':[0,1,2,3,4,5,6,7,8,9],
'Frequency':[21.0, 18.0, 16.0, 10.0, 10.0, 9.0, 10.0, 10.0, 5.0, 8.0]})
groups = [(3, 5), (6, 10), (11, 30)]
Time taken by the initial solution: 0.5420
初始解所用时间:
0.5420
import timeit
start_time = timeit.default_timer()
fre_table['Group'] = 0
for i in range(0, len(fre_table["Frequency"])):
for j in range(0, len(groups)):
if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
break
fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])
elapsed_time = timeit.default_timer() - start_time
Time take by the final solution: 0.0043s
最终解决方案
0.0043s
: 0.0043s
import timeit
start_time = timeit.default_timer()
bins = pd.IntervalIndex.from_tuples(groups)
fre_table['Group'] = pd.cut(fre_table['Frequency'], bins)
elapsed_time = timeit.default_timer() - start_time
About a 100 times faster!大约快 100 倍!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.