简体   繁体   English

Python:将值添加到空字典

[英]Python: Adding values to empty dictionary

I have scraped a data from website and I would like to save all of data.我从网站上抓取了一个数据,我想保存所有数据。 However, it only saves the last value of the data.但是,它只保存数据的最后一个值。 I have made an empty dictionary but i'm struggling with adding element in empty dictionary我制作了一本空字典,但我正在努力在空字典中添加元素

Here's my code这是我的代码

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy

try:
    source = requests.get('https://www.imdb.com/chart/top/')
    source.raise_for_status()

    soup = BeautifulSoup(source.text,'html.parser')


    movies = soup.find('tbody', class_="lister-list").find_all('tr')    
    
data = {}

    for movie in movies: 
        
        name = movie.find('td', class_='titleColumn').a.text
        
        rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

        year = movie.find('td', class_="titleColumn").span.text.strip('()')

        rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
        
except Exception as e:
    print(e)

print(data)

Close to your goal, simply add the information to your dict and append it with each iteration to a list.接近您的目标,只需将信息添加到您的字典和 append 每次迭代到列表中。 So you are able to create a dataframe:所以你可以创建一个 dataframe:

for movie in movies:

    data.append({
        'name': movie.find('td', class_='titleColumn').a.text,
        'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
        'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
        'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
    })
Example例子
from bs4 import BeautifulSoup
import requests
import pandas as pd

source = requests.get('https://www.imdb.com/chart/top/')
source.raise_for_status()

soup = BeautifulSoup(source.text,'html.parser')

movies = soup.find('tbody', class_="lister-list").find_all('tr')
data = []

for movie in movies:

    data.append({
        'name': movie.find('td', class_='titleColumn').a.text,
        'rank': movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0],
        'year': movie.find('td', class_="titleColumn").span.text.strip('()'),
        'rating': movie.find('td', class_="ratingColumn imdbRating").strong.text
    })

pd.DataFrame(data)

Output Output

name姓名 rank year rating评分
0 0 Die Verurteilten死亡 1 1 1994 1994 9.2 9.2
1 1 Der Pate德佩特 2 2 1972 1972年 9.2 9.2
2 2 The Dark Knight黑暗骑士 3 3 2008 2008年 9 9
3 3 Der Pate 2德佩特 2 4 4 1974 1974年 9 9
4 4 Die zwölf Geschworenen Die zwölf Geschworenen 5 5 1957 1957年 8.9 8.9

.... ……

you can replace your for loop with this one to add nested dictionaries, so you can find your movie info by name, then what info you wanted from it你可以用这个替换你的for循环来添加嵌套字典,这样你就可以按名字找到你的电影信息,然后你想从中得到什么信息

for movie in movies:
    
    name = movie.find('td', class_='titleColumn').a.text

    data[name] = {}
    
    rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

    year = movie.find('td', class_="titleColumn").span.text.strip('()')

    rating = movie.find('td', class_="ratingColumn imdbRating").strong.text

    data[name]["rank"] = rank
    data[name]["year"] = year
    data[name]["rating"] = rating

print(data)

I would suggest you to store the cur movie in data but make the name of the movie as a key我建议您将 cur 电影存储在数据中,但将电影名称作为键

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy

try:
    source = requests.get('https://www.imdb.com/chart/top/')
    source.raise_for_status()

    soup = BeautifulSoup(source.text,'html.parser')


    movies = soup.find('tbody', class_="lister-list").find_all('tr')    
    
data = {}

    for movie in movies: 
        
        name = movie.find('td', class_='titleColumn').a.text
        
        rank = movie.find('td', class_="titleColumn").get_text(strip=True).split('.')[0] 

        year = movie.find('td', class_="titleColumn").span.text.strip('()')

        rating = movie.find('td', class_="ratingColumn imdbRating").strong.text
        cur = {
            'name': name,
            'rank': rank,
            'year': year.
            'rating': rating
        }
        # storing the cur movie in data but name of the movie as a key 
        data[name] = cur
        
except Exception as e:
    print(e)

print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM