[英]Scraping issue using BeautifulSoup and Selenium
I am starting coding for myself and I am blocked on a code line.我开始为自己编码,但我在代码行上被阻止了。 Can you provide me some explications ?你能给我一些解释吗?
I want to scrape informations from this div tag :我想从这个 div 标签中抓取信息:
role = experience1_div('span', {'class' : 'mr1 t-bold'}) print(role)
Output :输出 :
[<span class="mr1 t-bold"> <span aria-hidden="true"><!-- -->Automation Engineer - Intern<!-- --></span><span class="visually-hidden"><!-- -->Automation Engineer - Intern<!-- --></span> </span>]
How can I get only the HTML text : "Automation Engineer - Intern"我怎样才能只获得 HTML 文本:“自动化工程师 - 实习生”
I tried this function .get_text().strip()
but it seems that the span
tag is blocking my function....我试过这个函数.get_text().strip()
但似乎span
标签阻止了我的函数....
I don't know what experience1_div
is but to get all text use role.text
我不知道experience1_div
是什么,但要获取所有文本使用role.text
role = experience1_div.find('span', {'class' : 'mr1 t-bold'})
print(role.text)
output: Automation Engineer - InternAutomation Engineer - Intern
输出: Automation Engineer - InternAutomation Engineer - Intern
To get text from the first nested span, use role.span.text
要从第一个嵌套跨度中获取文本,请使用role.span.text
or from the second nested span role.contents[2].text
或从第二个嵌套跨度role.contents[2].text
Main issue in provided information is that you have generated a ResultSet
- To get its text you have to pick the element directly or iterate it.提供的信息中的主要问题是您已经生成了一个ResultSet
- 要获取其文本,您必须直接选择元素或对其进行迭代。
role[0].span.get_text(strip=True)
or或者
for e in role:
print(e.span.get_text(strip=True))
Output:输出:
Automation Engineer - Intern
Better approach would be to select your element more specific (based on your example):更好的方法是选择更具体的元素(根据您的示例):
experience1_div.select_one('span.mr1.t-bold > span').get_text(strip=True)
This is the simplest technique to achieve your aim.这是实现目标的最简单技术。
role = experience1_div.select_one('span.mr1.t-bold >span').get_text(strip=True) print(role)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.