[英]beautifulsoup class value extract python
Im trying to extract values from html page using beautifulsoup.我正在尝试使用 beautifulsoup 从 html 页面中提取值。
I updated Jack's code and now it extracts rating in commentaries.我更新了 Jack 的代码,现在它在评论中提取评分。 But I have 2 issues: 1. It extracts rating only from first 10 reviews 2. I would like to include also third column to extraction, date, which is located in upper left of review.
但我有两个问题: 1. 它仅从前 10 条评论中提取评分 2. 我还想将第三列提取到日期,它位于评论的左上角。 Could you please help me?
请你帮助我好吗?
url = 'https://www.kununu.com/de/allianz-deutschland/kommentare'
page = requests.get(url)
soup = bs(page.text, 'html.parser')
divs = soup.find_all(class_="col-xs-12 col-lg-12")
titles = [] #this initializes a list of titles
badges = [] #this initializes a list of badges
for item in divs[0].find_all('span',class_="rating-title"):
titles.append(item.text.strip())
for item in divs[0].find_all('span',class_="rating-badge"):
badges.append(item.text.strip())
my_list = list(zip(titles, badges)) #this takes the two lists, zips them and converts the zip element back to a list
df = pd.DataFrame(my_list, columns = ['rating-title', 'rating-badge'])
print(df)
Output
rating-title rating-badge
0 Arbeitsatmosphäre 5,00
1 Vorgesetztenverhalten 2,00
2 Kollegenzusammenhalt 5,00
3 Interessante Aufgaben 4,00
4 Kommunikation 3,00
.. ... ...
125 Gehalt / Sozialleistungen 4,00
126 Arbeitsbedingungen 4,00
127 Umwelt- / Sozialbewusstsein 3,00
128 Work-Life-Balance 5,00
129 Image 4,00
[130 rows x 2 columns]
You haven't gone into the nested elements.您还没有进入嵌套元素。 You just grabbed and printed the parent element.
您刚刚抓取并打印了父元素。
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.kununu.com/de/allianz-deutschland/kommentare'
page = requests.get(url).text
soup = BeautifulSoup(page, 'html.parser')
div = soup.find(class_="col-md-9 col-sm-12 col-xs-12 flex-left")
row = div.find('div', {'class':'row'})
titles = [ x.text.strip() for x in row.find_all('span', {'class':'rating-title'}) ]
ratings = [ x.text.strip() for x in row.find_all('div', {'class':'rating-stars'}) ]
data_tuples = list(zip(titles,ratings))
df = pd.DataFrame(data_tuples, columns=['rating-title','rating-badge'])
Output:输出:
print (df)
title ratings
0 Arbeitsatmosphäre 3,62
1 Vorgesetztenverhalten 3,49
2 Kollegenzusammenhalt 3,92
3 Interessante Aufgaben 3,78
4 Kommunikation 3,44
5 Arbeitsbedingungen 3,70
6 Umwelt- / Sozialbewusstsein 3,76
7 Work-Life-Balance 3,54
8 Gleichberechtigung 3,94
9 Umgang mit älteren Kollegen 3,88
10 Karriere / Weiterbildung 3,52
11 Gehalt / Sozialleistungen 3,60
12 Image 3,80
The following should get you the data into a pandas dataframe:以下应该让你的数据进入熊猫数据帧:
import pandas as pd
import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.kununu.com/de/allianz-deutschland/kommentare'
page = requests.get(url)
soup = bs(page.text, 'html.parser')
divs = soup.find_all(class_="col-md-9 col-sm-12 col-xs-12 flex-left")
titles = [] #this initializes a list of titles
badges = [] #this initializes a list of badges
for item in divs[0].find_all('span',class_="rating-title"):
titles.append(item.text.strip())
for item in divs[0].find_all('span',class_="rating-badge"):
badges.append(item.text.strip())
my_list = list(zip(titles, badges)) #this takes the two lists, zips them and converts the zip element back to a list
df = pd.DataFrame(my_list, columns = ['rating-title', 'rating-badge'])
df
Output:输出:
rating-title rating-badge
0 Arbeitsatmosphäre 3,62
1 Vorgesetztenverhalten 3,49
2 Kollegenzusammenhalt 3,92
etc.等等。
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.kununu.com/de/allianz-deutschland/kommentare')
soup = BeautifulSoup(r.text, 'html.parser')
rates = []
stars = []
for rate in soup.findAll('div', attrs={'col-lg-6 col-md-12 col-sm-12 col-xs-12'}):
for item in rate.findAll('span', attrs={'class': 'rating-title'}):
item = item.text.strip()
rates.append(item)
for star in soup.findAll('div', attrs={'col-lg-6 col-md-12 col-sm-12 col-xs-12'}):
for item in star.findAll('span', attrs={'class': 'rating-badge'}):
item = item.text.strip()
stars.append(item)
for a, b in zip(rates, stars):
print("Name: {:<30} Stars: {:>5}".format(a, b))
Output:输出:
Name: Arbeitsatmosphäre Stars: 3,62
Name: Vorgesetztenverhalten Stars: 3,49
Name: Kollegenzusammenhalt Stars: 3,92
Name: Interessante Aufgaben Stars: 3,78
Name: Kommunikation Stars: 3,44
Name: Arbeitsbedingungen Stars: 3,70
Name: Umwelt- / Sozialbewusstsein Stars: 3,76
Name: Work-Life-Balance Stars: 3,54
Name: Gleichberechtigung Stars: 3,94
Name: Umgang mit älteren Kollegen Stars: 3,88
Name: Karriere / Weiterbildung Stars: 3,52
Name: Gehalt / Sozialleistungen Stars: 3,60
Name: Image Stars: 3,80
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.