[英]Python: Adding values to empty dictionary
I have scraped a data from website and I would like to save all of data.我从网站上抓取了一个数据,我想保存所有数据。 However, it only saves the last value of the data.
但是,它只保存数据的最后一个值。 I have made an empty dictionary but i'm struggling with adding element in empty dictionary
我制作了一本空字典,但我正在努力在空字典中添加元素
Here's my code这是我的代码
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy
try:
source = requests.get('https://www.imdb.com/chart/top/')
source.raise_for_status()
soup = BeautifulSoup(source.text,'html.parser')
movies = soup.find('tbody', class_="lister-list").find_all('tr')
data = {}
for movie in movies:
name = movie.find('td', class_='titleColumn').a.text
rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0]
year = movie.find('td', class_="titleColumn").span.text.strip('()')
rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
except Exception as e:
print(e)
print(data)
Close to your goal, simply add the information to your dict and append it with each iteration to a list.接近您的目标,只需将信息添加到您的字典和 append 每次迭代到列表中。 So you are able to create a dataframe:
所以你可以创建一个 dataframe:
for movie in movies:
data.append({
'name': movie.find('td', class_='titleColumn').a.text,
'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
})
from bs4 import BeautifulSoup
import requests
import pandas as pd
source = requests.get('https://www.imdb.com/chart/top/')
source.raise_for_status()
soup = BeautifulSoup(source.text,'html.parser')
movies = soup.find('tbody', class_="lister-list").find_all('tr')
data = []
for movie in movies:
data.append({
'name': movie.find('td', class_='titleColumn').a.text,
'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
})
pd.DataFrame(data)
name![]() |
rank![]() |
year![]() |
rating![]() |
|
---|---|---|---|---|
0 ![]() |
Die Verurteilten![]() |
1 ![]() |
1994 ![]() |
9.2 ![]() |
1 ![]() |
Der Pate![]() |
2 ![]() |
1972 ![]() |
9.2 ![]() |
2 ![]() |
The Dark Knight![]() |
3 ![]() |
2008 ![]() |
9 ![]() |
3 ![]() |
Der Pate 2![]() |
4 ![]() |
1974 ![]() |
9 ![]() |
4 ![]() |
Die zwölf Geschworenen ![]() |
5 ![]() |
1957 ![]() |
8.9 ![]() |
.... ……
you can replace your for loop with this one to add nested dictionaries, so you can find your movie info by name, then what info you wanted from it你可以用这个替换你的for循环来添加嵌套字典,这样你就可以按名字找到你的电影信息,然后你想从中得到什么信息
for movie in movies:
name = movie.find('td', class_='titleColumn').a.text
data[name] = {}
rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0]
year = movie.find('td', class_="titleColumn").span.text.strip('()')
rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
data[name]["rank"] = rank
data[name]["year"] = year
data[name]["rating"] = rating
print(data)
I would suggest you to store the cur movie in data but make the name of the movie as a key我建议您将 cur 电影存储在数据中,但将电影名称作为键
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy
try:
source = requests.get('https://www.imdb.com/chart/top/')
source.raise_for_status()
soup = BeautifulSoup(source.text,'html.parser')
movies = soup.find('tbody', class_="lister-list").find_all('tr')
data = {}
for movie in movies:
name = movie.find('td', class_='titleColumn').a.text
rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0]
year = movie.find('td', class_="titleColumn").span.text.strip('()')
rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
cur = {
'name': name,
'rank': rank,
'year': year.
'rating': rating
}
# storing the cur movie in data but name of the movie as a key
data[name] = cur
except Exception as e:
print(e)
print(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.