[英]Same python function giving different output
我正在用python制作一個抓取腳本。 我首先收集電影的鏈接,我必須從那里刪除歌曲列表。 這是包含電影鏈接的movie.txt列表
https://www.lyricsbogie.com/category/movies/a-flat-2010 https://www.lyricsbogie.com/category/movies/a-night-in-calcutta-1970 https://www.lyricsbogie。 com/category/movies/a-scandall-2016 https://www.lyricsbogie.com/category/movies/a-strange-love-story-2011 https://www.lyricsbogie.com/category/movies/a- sublime-love-story-barsaat-2005 https://www.lyricsbogie.com/category/movies/a-wednesday-2008 https://www.lyricsbogie.com/category/movies/aa-ab-laut-chalen- 1999https://www.lyricsbogie.com/category/movies/aa-dekhen-zara-2009 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1973 https://www .lyricsbogie.com/category/movies/aa-gale-lag-jaa-1994 https://www.lyricsbogie.com/category/movies/aabra-ka-daabra-2004 https://www.lyricsbogie.com/category /movies/aabroo-1943 https://www.lyricsbogie.com/category/movies/aabroo-1956 https://www.lyricsbogie.com/category/movies/aabroo-1968 https://www.lyricsbogie.com/類別/電影/aabshar-1953
這是我的第一個 python 函數:
import requests
from bs4 import BeautifulSoup as bs
def get_songs_links_for_movies1():
url='https://www.lyricsbogie.com/category/movies/a-flat-2010'
source_code = requests.get(url)
plain_text = source_code.text
soup = bs(plain_text,"html.parser")
for link in soup.find_all('h3',class_='entry-title'):
href = link.a.get('href')
href = href+"\n"
print(href)
上述函數的輸出:
https://www.lyricsbogie.com/movies/a-flat-2010/pyar-itna-na-kar.html
https://www.lyricsbogie.com/movies/a-flat-2010/chal-halke-halke.html
https://www.lyricsbogie.com/movies/a-flat-2010/meetha-sa-ishq.html
https://www.lyricsbogie.com/movies/a-flat-2010/dil-kashi.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
它成功獲取了指定鏈接的歌曲 url。 但是現在當我嘗試自動化該過程並傳遞一個文件movie.txt來一一讀取url並獲得結果時,其輸出與上面我自己一一添加url的函數不匹配。 此功能也不會獲取歌曲網址。 這是我的功能無法正常工作。
import requests
from bs4 import BeautifulSoup as bs
def get_songs_links_for_movies():
file = open("movie.txt","r")
for url in file:
source_code = requests.get(url)
plain_text = source_code.text
soup = bs(plain_text,"html.parser")
for link in soup.find_all('h3',class_='entry-title'):
href = link.a.get('href')
href = href+"\n"
print(href)
上述函數的輸出
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
等等..........
通過比較第一功能輸出和第二功能輸出。 您清楚地看到,沒有函數 1 獲取的歌曲 url,函數 2 也沒有一次又一次地重復相同的輸出。
任何人都可以幫助我,為什么會這樣。
要了解發生了什么,您可以打印從for
循環中的文件讀取的 url 的表示:
for url in file:
print(repr(url))
...
打印此表示(而不僅僅是字符串)可以更容易地查看特殊字符。 在這種情況下,輸出給出了'https://www.lyricsbogie.com/category/movies/a-flat-2010\\n'
。 如您所見,url 中存在換行符,因此獲取的 url 不正確。
例如使用rstrip()
方法來刪除換行符,通過替換url
由url.rstrip()
我懷疑您的文件不是作為一行讀取的,可以肯定的是,您可以測試以下代碼:
import requests
from bs4 import BeautifulSoup as bs
def get_songs_links_for_movies(url):
print("##Getting songs from %s" % url)
source_code = requests.get(url)
plain_text = source_code.text
soup = bs(plain_text,"html.parser")
for link in soup.find_all('h3',class_='entry-title'):
href = link.a.get('href')
href = href+"\n"
print(href)
def get_urls_from_file(filename):
with open(filename, 'r') as f:
return [url for url in f.readlines()]
urls = get_urls_from_file("movie.txt")
for url in urls:
get_songs_links_for_movies(url)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.