[英]Apply BeautifulSoup function to Pandas DataFrame
I have a Pandas DataFrame that I got from reading a csv, in that file there is HTML tags I want to remove. 我有一个通过读取csv获得的Pandas DataFrame,在该文件中有要删除的HTML标签。 I want to remove the tags with BeautifulSoup because it is more reliable than using a simple regex like <.*?>. 我想用BeautifulSoup删除标签,因为它比使用<。*?>这样的简单正则表达式更可靠。
I usually remove HTML tags from Strings by executing 我通常通过执行以下操作从字符串中删除HTML标签
text = BeautifulSoup(text, 'html.parser').get_text()
Now I want to do this with every element in my DataFrame, so I tried the following: 现在,我想对DataFrame中的每个元素执行此操作,因此尝试了以下操作:
df.apply(lambda text: BeautifulSoup(text, 'html.parser').get_text())
But that returns the following error: 但这返回以下错误:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index id')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.