[英]Pandas: Applying filter function on column which type is a list
So my problem is that, I have created DF with column ('Interval_range'), that zips values from two other columns('Start_interval and 'End_interval') and creates third one (Interval_range'), which type is a list.所以我的问题是,我已经用列 ('Interval_range') 创建了 DF,它从其他两列('Start_interval 和'End_interval')中压缩值并创建第三个(Interval_range'),它的类型是一个列表。 Now, I woluld like to create fourth one ('Absolute_frequency'), that counts all the values from the external list ('dataset'), which are bigger than first argument of 'Interval_range' column, and less than second argument of 'Interval_range' column.
现在,我想创建第四个 ('Absolute_frequency'),它计算外部列表 ('dataset') 中的所有值,这些值大于 'Interval_range' 列的第一个参数,小于 'Interval_range' 的第二个参数' 柱子。 How can I do that?
我怎样才能做到这一点? I was trying to use .apply() and then filter() but finally i got nothing.
我试图使用 .apply() 然后 filter() 但最后我什么也没得到。 Here is all my code:
这是我的所有代码:
dataset= [8,30,30,50,86,94,102,110,169,170,176,236,240,241,242,255,262,276,279,282]
interval = (max(dataset)-min(dataset))/6
int1 = [round(min(dataset),2),round(min(dataset)+interval,2)]
int2= [round(int1[1],2),round(int1[1]+interval,2)]
int3 =[round(int2[1],2),round(int2[1]+interval,2)]
int4 =[round(int3[1],2),round(int3[1]+interval,2)]
int5 =[round(int4[1],2),round(int4[1]+interval,2)]
int6 =[round(int5[1],2),round(max(dataset),2)]
intervals = [int1,int2,int3,int4,int5,int6]
analiza = pd.DataFrame(intervals,index='A B C D E F'.split(),columns='Start_interval End_interval'.split())
analiza['Interval_range'] = list(zip(analiza['Start_interval'],analiza['End_interval']))
If you convert dataset
to numpy
array you can iterate over 'Interval_range'
column and check if the value s in the range of the tuples using apply()
and lambda
expression如果将
dataset
转换为numpy
数组,则可以遍历'Interval_range'
列并使用apply()
和lambda
表达式检查值是否在元组范围内
dataset = [8, 30, 30, 50, 86, 94, 102, 110, 169, 170, 176, 236, 240, 241, 242, 255, 262, 276, 279, 282]
arr = np.array(dataset)
....
analiza['Absolute_frequency'] = analiza['Interval_range'].apply(lambda x: ((x[0] <= arr) & (arr <= x[1])).sum())
Output输出
Start_interval End_interval Interval_range Absolute_frequency
A 8.00 53.67 (8.0, 53.67) 4
B 53.67 99.34 (53.67, 99.34) 2
C 99.34 145.01 (99.34, 145.01) 2
D 145.01 190.68 (145.01, 190.68) 3
E 190.68 236.35 (190.68, 236.35) 1
F 236.35 282.00 (236.35, 282.0) 8
These lines of code creates a new column with the name "Absolute_frequency" and calculates the frequency as you stated.这些代码行创建了一个名为“Absolute_frequency”的新列,并按照您的说明计算频率。
analiza["Absolute_frequency"] = [None for i in range(len(analiza.index))]
def calculate_abs_freq(row):
lower = row["Start_interval"]
upper = row["End_interval"]
counter = 0
for data in dataset:
if(data >= lower and data <= upper):
counter += 1
row["Absolute_frequency"] = counter
return row
analiza = analiza.apply(calculate_abs_freq, axis = 1)
analiza
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.