繁体   English   中英

检查 dataframe 中存在的值

[英]Check value present in dataframe

我想检查某一列中是否存在某些值,如果我们得到数据,则使用 python 检查另一列,然后标记为是

我们有以下 dataframe

import pandas as pd
import numpy as np
a1=["Highschool.sg","school","school.sggs","school.coep","school.mit","address","address.pune","address.Nanded","address.mumbai"]
a2=[34,56,55,34,23,60,34,56,100]
a3=[np.nan,str(["sggs","coep","mit"]),np.nan,np.nan,np.nan,str(["pune","Nanded"]),np.nan,np.nan,np.nan]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','id','required'])

在此处输入图像描述

在这里,如果我们得到像 ['sggs','coep','mit'] 这样的任何值,则需要检查列中是否有任何关键字与该值匹配,然后标记为 YES

预期 output

在此处输入图像描述

您可以extract单词(或使用任何方法获取单个字符串),连接到“数据”,并使用单词列表查找boolean 索引的匹配项:

target = (df['data']+'.'
          +df['required'].str.extractall('(\w+)')[0].droplevel(1)
         ).dropna()
# ['school.sggs', 'school.coep', 'school.mit', 'address.pune', 'address.Nanded']

df.loc[df['data'].isin(target), 'required'] = 'Yes'

output:

             data   id                 required
0   Highschool.sg   34                      NaN
1          school   56  ['sggs', 'coep', 'mit']
2     school.sggs   55                      Yes
3     school.coep   34                      Yes
4      school.mit   23                      Yes
5         address   60       ['pune', 'Nanded']
6    address.pune   34                      Yes
7  address.Nanded   56                      Yes
8  address.mumbai  100                      NaN

您可以使用ast.literal_eval将您的字符串列表转换为实际列表,并对str.contains的分解值列表操作df.required

from ast import literal_eval
required = df.required.dropna().apply(literal_eval).explode()
df.loc[df.data.str.contains('|'.join(required)).values, 'required'] = 'Yes'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM