[英]pandas - find rows if values in column of dtype list (object) has specific value
Given a data frame like below给定如下数据框
A B C-1 D BTP Type C1 Type C2
0 0 1 0 0 0 NaN [Type B]
1 0 2 1 1 14 [Type B] [Type B]
2 0 3 2 2 28 [Type A] [Type B]
3 0 4 3 3 42 [Type A, Type B] [Type A, Type B]
4 0 5 4 4 56 [Type A] [Type A, Type B]
want to fetch rows with value Type A
for column Type C1
and 42
for column BTP
which should return row index 3.想要为
Type C1
列获取值为Type A
的行,为BTP
列获取值为42
的行,这应该返回行索引 3。
Tried the following, but gives an error KeyError: False
尝试了以下,但给出了错误
KeyError: False
df.loc[(df['BTP'] == 42) & ('Type A' in df['Type C1'])]
What I'm ultimately trying to do is to fetch row that will match the above condition (which would be a single row) and extract the values for columns B
and C-1
as a dict like {'B_val': 4, 'C_val': 3}
我最终要做的是获取与上述条件匹配的行(这将是单行)并将列
B
和C-1
的值提取为像{'B_val': 4, 'C_val': 3}
Use, Series.str.join
to join the lists in column Type C1
, then we could be able to use Series.str.contains
on this column to check whether the given string ie Type A
is present in the series or not, finally we can filter the rows of dataframe using the boolean mask
:使用
Series.str.join
加入Type C1
列中的列表,然后我们可以在该列上使用Series.str.contains
来检查给定的字符串,即Type A
是否存在于系列中,最后我们可以使用 boolean mask
过滤 dataframe 的行:
mask = df['BTP'].eq(42) & df['Type C1'].str.join('-').str.contains(r'\bType A\b')
df = df[mask]
Result:结果:
# print(df)
A B C-1 D BTP Type C1 Type C2
3 0 4 3 3 42 [Type A, Type B] [Type A, Type B]
You can use您可以使用
>>> type_a = df['Type C1'].apply(pd.Series).eq('Type A').any(1)
>>> df[df['BTP'].eq(42) & type_a]
A B C-1 D BTP Type C1 Type C2
3 0 4 3 3 42 [Type A, Type B] [Type A, Type B]
I solved this with a custom function to return a list of True/False values for each row, based on whether the list under consideration contains 'Type A' or not.我使用自定义 function 解决了这个问题,根据考虑的列表是否包含“A 型”,返回每行的真/假值列表。
# Check if elem is present in column 'col'
def has_elem(col, elem):
result = []
for c in col:
if elem in c:
result.append(True)
else:
result.append(False)
return result
# Filter
df.loc[(df['BTP'] == 42) & has_elem(df['Type_C1'], 'Type A'), :]
The reason your code doesn't work is because the 2nd filter clause 'Type A' in df['Type_C1']
looks for membership of the string 'Type A'
in the Series object df['Type_C1']
, and consequently returns False
.您的代码不起作用的原因是因为 df['Type_C1'] 中的第二个过滤器子句
'Type A' in df['Type_C1']
查找 object df['Type_C1']
系列中字符串'Type A'
的成员资格,因此返回False
. Instead, you need to return a sequence of True/False values, 1 for each row in your dataframe.相反,您需要为 dataframe 中的每一行返回一个真/假值序列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.