简体   繁体   English

如何使用python中的beautifulsoup从网页中获取数据

[英]How do I get scrape data from web pages using beautifulsoup in python

I am trying to scrape the data from the given link bellow, a link我试图从下面给定的链接中抓取数据,一个链接

And I an saving it into csv file.我将它保存到 csv 文件中。

I got all movies name, but in other format bellow, please see bellow: I am getting bellow format in csv:我得到了所有的电影名称,但在下面的其他格式中,请参见下面:我在 csv 中得到了波纹管格式:

T h e " " S h a w s h a n k " " R e d e m p t i o n

T h e " " G o d f a t h e r

T h e " " G o d f a t h e r : " " P a r t " " I I

T h e " " D a r k " " K n i g h t

1 2 " " A n g r y " " M e n

S c h i n d l e r ' s " " L i s t

It should be:它应该是:

The Shawshank Redemption

The Godfather

The God father: Part II

The Dark Knight

I tried:我试过:

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://www.imdb.com/chart/top'
res = requests.get(url)
soup = BeautifulSoup(res.text)
movie = soup.find_all(class_='titleColumn')

for names in movie:
    for name in names.find_all('a'):
        movies=list(name.text)
        # print(movies)

        # IN CSV
        with open('TopMovies.csv', 'a') as csvFile:
            writer = csv.writer(csvFile, delimiter = ' ')
            writer.writerow(movies)
        csvFile.close()
        print(movies)

print("Successfully inserted")

Please, Let me know if its any changes in my code.请让我知道它是否对我的代码进行了任何更改。

Thanks谢谢

Problem is in line movies=list(name.text) - you are creating list, where each item is character from the string name.text .问题出在movies=list(name.text) - 您正在创建列表,其中每个项目都是字符串name.text的字符。

Instead of this list() , you can use list-comprehension movies = [name.text for name in names.find_all('a')] :您可以使用 list-comprehension movies = [name.text for name in names.find_all('a')]代替这个list()

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://www.imdb.com/chart/top'
res = requests.get(url)
soup = BeautifulSoup(res.text)
movie = soup.find_all(class_='titleColumn')

for names in movie:
    movies = [name.text for name in names.find_all('a')]
    # print(movies)

    # IN CSV
    with open('TopMovies.csv', 'a') as csvFile:
        writer = csv.writer(csvFile, delimiter = ' ')
        writer.writerow(movies)
    csvFile.close()
    print(movies)

print("Successfully inserted")

This will create TopMovies.csv correctly.这将正确创建TopMovies.csv

Screenshot from LibreOffice: LibreOffice 截图:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python/BeautifulSoup:从 Web 页中抓取数据 - Python/BeautifulSoup: Scrape Data from Web Pages 如何使用 beautifulsoup 和 python 抓取包含多个页面的站点? - How can I scrape a site with multiple pages using beautifulsoup and python? 如何使用beautifulsoup从多个页面抓取数据 - How to scrape data from multiple pages using beautifulsoup 如何使用Beautifulsoup在python中抓取下一页 - How to scrape the next pages in python using Beautifulsoup 我如何使用Python和BeautifulSoup从一个网络中的多个页面抓取数据 - How to scraping data from multiple pages in one web, I'm using Python and BeautifulSoup 如何使用 Python、Selenium 和 BeautifulSoup 抓取 JSP? - How do I web-scrape a JSP with Python, Selenium and BeautifulSoup? Python BeautifulSoup-使用给定URL中的iframe抓取多个网页 - Python BeautifulSoup - Scrape Multiple Web Pages with Iframes from Given URLs 如何使用BeautifulSoup抓取用javascript生成的数据? - How do I scrape data generated with javascript using BeautifulSoup? 如何使用 Python 和 BeautifulSoup 从 html 表中抓取数据? - How do I use Python and BeautifulSoup to scrape data from an html table? 如何使用 BeautifulSoup 从网页中抓取结构化表格? - How do i scrape a structured table from a webpage using BeautifulSoup?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM