简体   繁体   English

在 pd.dataframe 系列上使用.find()

[英]Using .find() on a pd.dataframe series

I have the following df:我有以下df:

data = {'Org':  ['<a href="/00xO" target="_blank">Chocolate</a>'],
        'Owner': ['Charlie']
        }

df = pd.DataFrame(data)

print (df)

and when I apply the lamba function below instead of giving me 'Chocolate' it's returning 0.当我在下面应用兰巴 function 而不是给我“巧克力”时,它返回 0。

df['Correct Org']=df['Org'].apply(lambda st: st[st.find(">"):st.find("<")])

I've tried adding 'str' as follows:我尝试按如下方式添加“str”:

df['Correct Org']=df['Org'].str.apply(lambda st: st[st.find(">")+1:st.find("<")])

& get the following error: & 得到以下错误:

AttributeError: 'StringMethods' object has no attribute 'apply'

You're getting None returned because df['Org'][0].find(">") returns 31 but df['Org'][0].find("<") returns 0. So it's not clear what st[st.find(">"):st.find("<") means.你得到 None 返回,因为df['Org'][0].find(">")返回 31 但df['Org'][0].find("<")返回 0。所以不清楚是什么st[st.find(">"):st.find("<")表示。 You can use bs4.BeautifulSoup to create a soup object and get the text inside a directly:您可以使用bs4.BeautifulSoup创建汤 object 并直接获取a中的文本:

from bs4 import BeautifulSoup
df['Org'] = df['Org'].apply(lambda x: BeautifulSoup(x).text)

Output: Output:

         Org    Owner
0  Chocolate  Charlie

Use BeautifulSoup for parsing html tags:使用BeautifulSoup解析 html 标签:

from bs4 import BeautifulSoup

df['Correct Org']=df['Org'].apply(lambda st: ','.join(BeautifulSoup(st, features="lxml").findAll(text=True)))

If you don't want to use BeautifulSoup , I wrote a simple function for you.如果您不想使用 BeautifulSoup ,我为您编写了一个简单的 function 。

A FUNCTION FOR GETTING THE LINK TEXT用于获取链接文本的 FUNCTION

def getOrg(link):
    link = str(link)
    link = link[link.find('>'):link.find("</")]
    return link.replace(link[0], '')

FOR EXAMPLE例如

import pandas as pd

data = {'Org':  ['<a href="/00xO" target="_blank">Chocolate</a>'],
        'Owner': ['Charlie']
        }

df = pd.DataFrame(data)


# Function Call
getOrg(df['Org'])

OUTPUT OUTPUT

Chocolate巧克力

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM