简体   繁体   English

将BeautifulSoup函数应用于Pandas DataFrame

[英]Apply BeautifulSoup function to Pandas DataFrame

I have a Pandas DataFrame that I got from reading a csv, in that file there is HTML tags I want to remove. 我有一个通过读取csv获得的Pandas DataFrame,在该文件中有要删除的HTML标签。 I want to remove the tags with BeautifulSoup because it is more reliable than using a simple regex like <.*?>. 我想用BeautifulSoup删除标签,因为它比使用<。*?>这样的简单正则表达式更可靠。

I usually remove HTML tags from Strings by executing 我通常通过执行以下操作从字符串中删除HTML标签

text = BeautifulSoup(text, 'html.parser').get_text()

Now I want to do this with every element in my DataFrame, so I tried the following: 现在,我想对DataFrame中的每个元素执行此操作,因此尝试了以下操作:

df.apply(lambda text: BeautifulSoup(text, 'html.parser').get_text())

But that returns the following error: 但这返回以下错误:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index id')

Use applymap 使用applymap

Ex: 例如:

import pandas as pd
from bs4 import BeautifulSoup


df = pd.DataFrame({"a": ["<a>Hello</a>"], "b":["<c>World</c>"]})
print(df.applymap(lambda text: BeautifulSoup(text, 'html.parser').get_text()))

Output: 输出:

       a      b
0  Hello  World

MoreInfo 更多信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM