简体   繁体   中英

Extract rows where the lists of columns contain certain values in a pandas dataframe

I have a dataframe that looks like this:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
1   2    [5]              [D]
2   3    [4, 12]          [A, D]
3   4    [2, 6, 13, 12]   [X, Z, T, D]

I would like the extract the rows where the list in the AgeGroups column has at least 2 values less than 7 and at least 1 value greater than 8.

So the result should look like this:

    ID   AgeGroups        PaperIDs
0   1    [3, 3, 10]       [A, B, C]
3   4    [2, 6, 13, 12]   [X, Z, T, D]

I'm not sure how to do it.

First create helper DataFrame and compare by DataFrame.lt and DataFrame.gt , then Series by Series.ge and chain masks by & for bitwise AND:

import ast
#if not lists
#df['AgeGroups'] = df['AgeGroups'].apply(ast.literal_eval)

df1 = pd.DataFrame(df['AgeGroups'].tolist())
df = df[df1.lt(7).sum(axis=1).ge(2) & df1.gt(8).sum(axis=1).ge(1)]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]

Or use list comprehension with compare numpy arrays, counts by sum and compare both counts chained by and , because scalars:

m = [(np.array(x) < 7).sum() >= 2 and (np.array(x) > 8).sum() >=1  for x in df['AgeGroups']]

df = df[m]
print (df)
   ID       AgeGroups      PaperIDs
0   1      [3, 3, 10]     [A, B, C]
3   4  [2, 6, 13, 12]  [X, Z, T, D]

Simple if else logic I wrote for each row using apply function, you can also use list comprehension for row.

data = {'ID':['1', '2', '3', '4'], 'AgeGroups':[[3,3,10],[2],[4,12],[2,6,13,12]],'PaperIDs':[['A','B','C'],['D'],['A','D'],['X','Z','T','D']]} 
df = pd.DataFrame(data)
def extract_age(row):
    my_list = row['AgeGroups']
    count1 = 0
    count2 = 0
    if len(my_list)>=3:
        for i in my_list:
            if i<7:
                count1 = count1 +1
            elif i>8:
                count2 = count2+1
    if (count1 >= 2) and (count2 >=1):
        print(row['AgeGroups'],row['PaperIDs'])


df.apply(lambda x: extract_age(x), axis =1)

Output

[3, 3, 10] ['A', 'B', 'C']
[2, 6, 13, 12] ['X', 'Z', 'T', 'D']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM