[英]Check value present in dataframe
我想检查某一列中是否存在某些值,如果我们得到数据,则使用 python 检查另一列,然后标记为是
我们有以下 dataframe
import pandas as pd
import numpy as np
a1=["Highschool.sg","school","school.sggs","school.coep","school.mit","address","address.pune","address.Nanded","address.mumbai"]
a2=[34,56,55,34,23,60,34,56,100]
a3=[np.nan,str(["sggs","coep","mit"]),np.nan,np.nan,np.nan,str(["pune","Nanded"]),np.nan,np.nan,np.nan]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','id','required'])
在这里,如果我们得到像 ['sggs','coep','mit'] 这样的任何值,则需要检查列中是否有任何关键字与该值匹配,然后标记为 YES
预期 output
您可以extract
单词(或使用任何方法获取单个字符串),连接到“数据”,并使用单词列表查找boolean 索引的匹配项:
target = (df['data']+'.'
+df['required'].str.extractall('(\w+)')[0].droplevel(1)
).dropna()
# ['school.sggs', 'school.coep', 'school.mit', 'address.pune', 'address.Nanded']
df.loc[df['data'].isin(target), 'required'] = 'Yes'
output:
data id required
0 Highschool.sg 34 NaN
1 school 56 ['sggs', 'coep', 'mit']
2 school.sggs 55 Yes
3 school.coep 34 Yes
4 school.mit 23 Yes
5 address 60 ['pune', 'Nanded']
6 address.pune 34 Yes
7 address.Nanded 56 Yes
8 address.mumbai 100 NaN
您可以使用ast.literal_eval
将您的字符串列表转换为实际列表,并对str.contains
的分解值列表操作df.required
:
from ast import literal_eval
required = df.required.dropna().apply(literal_eval).explode()
df.loc[df.data.str.contains('|'.join(required)).values, 'required'] = 'Yes'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.