简体   繁体   中英

pandas - find rows if values in column of dtype list (object) has specific value

Given a data frame like below

   A  B  C-1  D  BTP           Type C1           Type C2
0  0  1    0  0    0               NaN          [Type B]
1  0  2    1  1   14          [Type B]          [Type B]
2  0  3    2  2   28          [Type A]          [Type B]
3  0  4    3  3   42  [Type A, Type B]  [Type A, Type B]
4  0  5    4  4   56          [Type A]  [Type A, Type B]

want to fetch rows with value Type A for column Type C1 and 42 for column BTP which should return row index 3.

Tried the following, but gives an error KeyError: False

df.loc[(df['BTP'] == 42) & ('Type A' in df['Type C1'])]

What I'm ultimately trying to do is to fetch row that will match the above condition (which would be a single row) and extract the values for columns B and C-1 as a dict like {'B_val': 4, 'C_val': 3}

Use, Series.str.join to join the lists in column Type C1 , then we could be able to use Series.str.contains on this column to check whether the given string ie Type A is present in the series or not, finally we can filter the rows of dataframe using the boolean mask :

mask = df['BTP'].eq(42) & df['Type C1'].str.join('-').str.contains(r'\bType A\b')
df = df[mask]

Result:

# print(df)

   A  B  C-1  D  BTP           Type C1           Type C2
3  0  4    3  3   42  [Type A, Type B]  [Type A, Type B]

You can use

>>> type_a = df['Type C1'].apply(pd.Series).eq('Type A').any(1)
>>> df[df['BTP'].eq(42) & type_a]
   A  B  C-1  D  BTP           Type C1           Type C2
3  0  4    3  3   42  [Type A, Type B]  [Type A, Type B]

I solved this with a custom function to return a list of True/False values for each row, based on whether the list under consideration contains 'Type A' or not.

# Check if elem is present in column 'col'
def has_elem(col, elem):
    result = []
    for c in col:
        if elem in c:
            result.append(True)
        else:
            result.append(False)
    return result

# Filter
df.loc[(df['BTP'] == 42) & has_elem(df['Type_C1'], 'Type A'), :]

The reason your code doesn't work is because the 2nd filter clause 'Type A' in df['Type_C1'] looks for membership of the string 'Type A' in the Series object df['Type_C1'] , and consequently returns False . Instead, you need to return a sequence of True/False values, 1 for each row in your dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM