简体   繁体   English

Pandas:在类型为列表的列上应用过滤器功能

[英]Pandas: Applying filter function on column which type is a list

So my problem is that, I have created DF with column ('Interval_range'), that zips values from two other columns('Start_interval and 'End_interval') and creates third one (Interval_range'), which type is a list.所以我的问题是,我已经用列 ('Interval_range') 创建了 DF,它从其他两列('Start_interval 和'End_interval')中压缩值并创建第三个(Interval_range'),它的类型是一个列表。 Now, I woluld like to create fourth one ('Absolute_frequency'), that counts all the values from the external list ('dataset'), which are bigger than first argument of 'Interval_range' column, and less than second argument of 'Interval_range' column.现在,我想创建第四个 ('Absolute_frequency'),它计算外部列表 ('dataset') 中的所有值,这些值大于 'Interval_range' 列的第一个参数,小于 'Interval_range' 的第二个参数' 柱子。 How can I do that?我怎样才能做到这一点? I was trying to use .apply() and then filter() but finally i got nothing.我试图使用 .apply() 然后 filter() 但最后我什么也没得到。 Here is all my code:这是我的所有代码:

dataset= [8,30,30,50,86,94,102,110,169,170,176,236,240,241,242,255,262,276,279,282]

interval = (max(dataset)-min(dataset))/6

int1 = [round(min(dataset),2),round(min(dataset)+interval,2)]
int2= [round(int1[1],2),round(int1[1]+interval,2)]
int3 =[round(int2[1],2),round(int2[1]+interval,2)]
int4 =[round(int3[1],2),round(int3[1]+interval,2)]
int5 =[round(int4[1],2),round(int4[1]+interval,2)]
int6 =[round(int5[1],2),round(max(dataset),2)]

intervals = [int1,int2,int3,int4,int5,int6]

analiza = pd.DataFrame(intervals,index='A B C D E F'.split(),columns='Start_interval End_interval'.split())
analiza['Interval_range'] = list(zip(analiza['Start_interval'],analiza['End_interval']))

If you convert dataset to numpy array you can iterate over 'Interval_range' column and check if the value s in the range of the tuples using apply() and lambda expression如果将dataset转换为numpy数组,则可以遍历'Interval_range'列并使用apply()lambda表达式检查值是否在元组范围内

dataset = [8, 30, 30, 50, 86, 94, 102, 110, 169, 170, 176, 236, 240, 241, 242, 255, 262, 276, 279, 282]
arr = np.array(dataset)
....
analiza['Absolute_frequency'] = analiza['Interval_range'].apply(lambda x: ((x[0] <= arr) & (arr <= x[1])).sum())

Output输出

   Start_interval  End_interval    Interval_range  Absolute_frequency
A            8.00         53.67      (8.0, 53.67)                   4
B           53.67         99.34    (53.67, 99.34)                   2
C           99.34        145.01   (99.34, 145.01)                   2
D          145.01        190.68  (145.01, 190.68)                   3
E          190.68        236.35  (190.68, 236.35)                   1
F          236.35        282.00   (236.35, 282.0)                   8

These lines of code creates a new column with the name "Absolute_frequency" and calculates the frequency as you stated.这些代码行创建了一个名为“Absolute_frequency”的新列,并按照您的说明计算频率。

analiza["Absolute_frequency"] = [None for i in range(len(analiza.index))]
def calculate_abs_freq(row):
    lower = row["Start_interval"]
    upper = row["End_interval"]
    counter = 0
    for data in dataset:
        if(data >= lower and data <= upper):
            counter += 1
    row["Absolute_frequency"] = counter
    return row

analiza = analiza.apply(calculate_abs_freq, axis = 1)
analiza

abs_frequency 结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM