[英]How do I get scrape data from web pages using beautifulsoup in python
我试图从下面给定的链接中抓取数据,一个链接
我将它保存到 csv 文件中。
我得到了所有的电影名称,但在下面的其他格式中,请参见下面:我在 csv 中得到了波纹管格式:
T h e " " S h a w s h a n k " " R e d e m p t i o n
T h e " " G o d f a t h e r
T h e " " G o d f a t h e r : " " P a r t " " I I
T h e " " D a r k " " K n i g h t
1 2 " " A n g r y " " M e n
S c h i n d l e r ' s " " L i s t
它应该是:
The Shawshank Redemption
The Godfather
The God father: Part II
The Dark Knight
我试过:
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://www.imdb.com/chart/top'
res = requests.get(url)
soup = BeautifulSoup(res.text)
movie = soup.find_all(class_='titleColumn')
for names in movie:
for name in names.find_all('a'):
movies=list(name.text)
# print(movies)
# IN CSV
with open('TopMovies.csv', 'a') as csvFile:
writer = csv.writer(csvFile, delimiter = ' ')
writer.writerow(movies)
csvFile.close()
print(movies)
print("Successfully inserted")
请让我知道它是否对我的代码进行了任何更改。
谢谢
问题出在movies=list(name.text)
- 您正在创建列表,其中每个项目都是字符串name.text
的字符。
您可以使用 list-comprehension movies = [name.text for name in names.find_all('a')]
代替这个list()
:
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://www.imdb.com/chart/top'
res = requests.get(url)
soup = BeautifulSoup(res.text)
movie = soup.find_all(class_='titleColumn')
for names in movie:
movies = [name.text for name in names.find_all('a')]
# print(movies)
# IN CSV
with open('TopMovies.csv', 'a') as csvFile:
writer = csv.writer(csvFile, delimiter = ' ')
writer.writerow(movies)
csvFile.close()
print(movies)
print("Successfully inserted")
这将正确创建TopMovies.csv
。
LibreOffice 截图:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.