将BeautifulSoup函数应用于Pandas DataFrame

Question

I have a Pandas DataFrame that I got from reading a csv, in that file there is HTML tags I want to remove. 我有一个通过读取csv获得的Pandas DataFrame，在该文件中有要删除的HTML标签。 I want to remove the tags with BeautifulSoup because it is more reliable than using a simple regex like <.*?>. 我想用BeautifulSoup删除标签，因为它比使用<。*？>这样的简单正则表达式更可靠。

I usually remove HTML tags from Strings by executing 我通常通过执行以下操作从字符串中删除HTML标签

text = BeautifulSoup(text, 'html.parser').get_text()

Now I want to do this with every element in my DataFrame, so I tried the following: 现在，我想对DataFrame中的每个元素执行此操作，因此尝试了以下操作：

df.apply(lambda text: BeautifulSoup(text, 'html.parser').get_text())

But that returns the following error: 但这返回以下错误：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index id')

Answer 1

Use applymap 使用applymap

Ex: 例如：

import pandas as pd
from bs4 import BeautifulSoup


df = pd.DataFrame({"a": ["<a>Hello</a>"], "b":["<c>World</c>"]})
print(df.applymap(lambda text: BeautifulSoup(text, 'html.parser').get_text()))

Output: 输出：

       a      b
0  Hello  World

MoreInfo 更多信息

将BeautifulSoup函数应用于Pandas DataFrame

问题描述

1 个解决方案

解决方案1
1 2018-11-07 12:34:19

将BeautifulSoup函数应用于Pandas DataFrame

问题描述

1 个解决方案

解决方案1 1 2018-11-07 12:34:19

解决方案1
1 2018-11-07 12:34:19