简体   繁体   English

在dataframe中查找列表中的元素

[英]find element in list in dataframe

I have a dataframe "df1": 我有一个数据帧“df1”:

adj           response

beautiful    ["She's a beautiful girl/woman, and also a good teacher."]
good         ["She's a beautiful girl/woman, and also a good teacher."]
hideous      ["This city is hideous, let's move to the countryside."]

And here's the object list: 这是对象列表:

object=["girl","teacher","city","countryside","woman"]

Code: 码:

df1['response_split']=df1['response'].str.split(",")

After I split it, the dataframe will be like this: 拆分后,数据框将如下所示:

adj           response_split

beautiful    ["She's a beautiful girl/woman", " and also a good teacher."]
good         ["She's a beautiful girl/woman", " and also a good teacher."]
hideous      ["This city is hideous", " let's move to the countryside."]

I want to add another column "response_object", if they find the adj in response, they find its object from list object: expected result 我想添加另一列“response_object”,如果他们在响应中找到adj,他们会从列表对象中找到它的对象: 预期结果

adj           response_split                                               response_object

beautiful    ["She's a beautiful girl/woman", " and also a good teacher."]        girl
beautiful    ["She's a beautiful girl/woman", " and also a good teacher."]        woman
good         ["She's a beautiful girl/woman", " and also a good teacher."]        teacher
hideous      ["This city is hideous", " let's move to the countryside."]          city

code: 码:

for i in df1['response_split']:
    if df1['adj'] in i:
        if any(x in i and x in object):
            match = list(filter(lambda x: x in i, object))
            df1['response_object']=match

It prints ValueError 它打印出ValueError

First object is valid python builtins (code word), so better dont use it for variable, here is changed to L : 第一个object是有效的python builtins (代码字),所以最好不要将它用于变量,这里改为L

L=["girl","teacher","city","countryside","woman"]

Then zip splitted column with adj , loop by tuples, loop by values in L and match if both match with in and and : 然后zip分裂列与adj ,循环由元组,循环L的值并匹配,如果两者匹配inand

df1['response_split']=df1['response'].str.split(",")
L1 = [(a, b, o) for a, b in zip(df1['adj'], df1['response_split']) 
                for r in b 
                for o in L 
                if (o in r) and (a in r)]

What should be rewrite to loops: 什么应该重写循环:

df1['response_split']=df1['response'].str.split(",")

L1 = []
for a, b in zip(df1['adj'], df1['response_split']):
    for r in b:
        for o in L:
            if (o in r) and (a in r):
                L1.append((a, b, o))

Last create DataFrame constructor: 最后创建DataFrame构造函数:

df2 = pd.DataFrame(L1, columns=['adj','response_split','response_object'])
print (df2)
         adj                                     response_split  \
0  beautiful  [She's a beautiful girl/woman,  and also a goo...   
1  beautiful  [She's a beautiful girl/woman,  and also a goo...   
2       good  [She's a beautiful girl/woman,  and also a goo...   
3    hideous  [This city is hideous,  let's move to the coun...   

  response_object  
0            girl  
1           woman  
2         teacher  
3            city  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM