I have a data frame with more than 1000 columns, and I have a predefined group list. I would like to compare each cell value with the each group boundary and create a new column to assign the group name. I have written for loops
but it took more than 5 mins to process it. Is there any more efficient way to achieve this? Thanks
Here my data frame
Frequency
21.0
18.0
16.0
10.0
10.0
9.0
10.0
10.0
5.0
8.0
And my predefined group list
> groups
[(3, 5), (6, 10), (11, 30)]
What I would like to get is
Frequency Group
21.0 11-30
18.0 11-30
16.0 11-30
10.0 6-10
10.0 6-10
9.0 6-10
10.0 6-10
10.0 6-10
5.0 3-5
8.0 6-10
Here is my code
for i in range(0, len(fre_table["Frequency"])):
for j in range(0, len(groups)):
if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
break
fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])
Establishing the efficiency of the solution suggested by @BallpointBen in the comment section
Data:
import numpy as np
import pandas as pd
fre_table = pd.DataFrame({'Index':[0,1,2,3,4,5,6,7,8,9],
'Frequency':[21.0, 18.0, 16.0, 10.0, 10.0, 9.0, 10.0, 10.0, 5.0, 8.0]})
groups = [(3, 5), (6, 10), (11, 30)]
Time taken by the initial solution: 0.5420
import timeit
start_time = timeit.default_timer()
fre_table['Group'] = 0
for i in range(0, len(fre_table["Frequency"])):
for j in range(0, len(groups)):
if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
break
fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])
elapsed_time = timeit.default_timer() - start_time
Time take by the final solution: 0.0043s
import timeit
start_time = timeit.default_timer()
bins = pd.IntervalIndex.from_tuples(groups)
fre_table['Group'] = pd.cut(fre_table['Frequency'], bins)
elapsed_time = timeit.default_timer() - start_time
About a 100 times faster!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.