使用 BeautifulSoup 和 Selenium 的抓取问题

Question

I am starting coding for myself and I am blocked on a code line.我开始为自己编码，但我在代码行上被阻止了。 Can you provide me some explications ?你能给我一些解释吗？

I want to scrape informations from this div tag :我想从这个 div 标签中抓取信息：

role = experience1_div('span', {'class' : 'mr1 t-bold'}) print(role)

Output :输出：

[ Automation Engineer - InternAutomation Engineer - Intern ]

How can I get only the HTML text : "Automation Engineer - Intern"我怎样才能只获得 HTML 文本：“自动化工程师 - 实习生”

I tried this function .get_text().strip() but it seems that the span tag is blocking my function....我试过这个函数.get_text().strip()但似乎span标签阻止了我的函数....

Answer 1

I don't know what experience1_div is but to get all text use role.text我不知道experience1_div是什么，但要获取所有文本使用role.text

role = experience1_div.find('span', {'class' : 'mr1 t-bold'}) 
print(role.text)

output: Automation Engineer - InternAutomation Engineer - Intern输出： Automation Engineer - InternAutomation Engineer - Intern

To get text from the first nested span, use role.span.text要从第一个嵌套跨度中获取文本，请使用role.span.text

or from the second nested span role.contents[2].text或从第二个嵌套跨度role.contents[2].text

Answer 2

Main issue in provided information is that you have generated a ResultSet - To get its text you have to pick the element directly or iterate it.提供的信息中的主要问题是您已经生成了一个ResultSet - 要获取其文本，您必须直接选择元素或对其进行迭代。

role[0].span.get_text(strip=True)

or或者

for e in role:
    print(e.span.get_text(strip=True))

Output:输出：

Automation Engineer - Intern

Better approach would be to select your element more specific (based on your example):更好的方法是选择更具体的元素（根据您的示例）：

experience1_div.select_one('span.mr1.t-bold > span').get_text(strip=True)

Answer 3

This is the simplest technique to achieve your aim.这是实现目标的最简单技术。

 role = experience1_div.select_one('span.mr1.t-bold >span').get_text(strip=True) print(role)

使用 BeautifulSoup 和 Selenium 的抓取问题

问题描述

3 个解决方案

解决方案1
0 2022-05-23 20:06:19

解决方案2
0 已采纳 2022-05-23 20:12:26

解决方案3
0 2022-05-24 18:02:36

使用 BeautifulSoup 和 Selenium 的抓取问题

问题描述

3 个解决方案

解决方案1 0 2022-05-23 20:06:19

解决方案2 0 已采纳 2022-05-23 20:12:26

解决方案3 0 2022-05-24 18:02:36

解决方案1
0 2022-05-23 20:06:19

解决方案2
0 已采纳 2022-05-23 20:12:26

解决方案3
0 2022-05-24 18:02:36