简体   繁体   English

相同的python函数给出不同的输出

[英]Same python function giving different output

I am making a scraping script in python.我正在用python制作一个抓取脚本。 I first collect the links of the movie from where I have to scrap the songs list.我首先收集电影的链接,我必须从那里删除歌曲列表。 Here is the movie.txt list containing movies link这是包含电影链接的movie.txt列表

https://www.lyricsbogie.com/category/movies/a-flat-2010 https://www.lyricsbogie.com/category/movies/a-night-in-calcutta-1970 https://www.lyricsbogie.com/category/movies/a-scandall-2016 https://www.lyricsbogie.com/category/movies/a-strange-love-story-2011 https://www.lyricsbogie.com/category/movies/a-sublime-love-story-barsaat-2005 https://www.lyricsbogie.com/category/movies/a-wednesday-2008 https://www.lyricsbogie.com/category/movies/aa-ab-laut-chalen-1999https://www.lyricsbogie.com/category/movies/aa-dekhen-zara-2009 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1973 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1994 https://www.lyricsbogie.com/category/movies/aabra-ka-daabra-2004 https://www.lyricsbogie.com/category/movies/aabroo-1943 https://www.lyricsbogie.com/category/movies/aabroo-1956 https://www.lyricsbogie.com/category/movies/aabroo-1968 https://www.lyricsbogie.com/category/movies/aabshar-1953 https://www.lyricsbogie.com/category/movies/a-flat-2010 https://www.lyricsbogie.com/category/movies/a-night-in-calcutta-1970 https://www.lyricsbogie。 com/category/movies/a-scandall-2016 https://www.lyricsbogie.com/category/movies/a-strange-love-story-2011 https://www.lyricsbogie.com/category/movies/a- sublime-love-story-barsaat-2005 https://www.lyricsbogie.com/category/movies/a-wednesday-2008 https://www.lyricsbogie.com/category/movies/aa-ab-laut-chalen- 1999https://www.lyricsbogie.com/category/movies/aa-dekhen-zara-2009 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1973 https://www .lyricsbogie.com/category/movies/aa-gale-lag-jaa-1994 https://www.lyricsbogie.com/category/movies/aabra-ka-daabra-2004 https://www.lyricsbogie.com/category /movies/aabroo-1943 https://www.lyricsbogie.com/category/movies/aabroo-1956 https://www.lyricsbogie.com/category/movies/aabroo-1968 https://www.lyricsbogie.com/类别/电影/aabshar-1953

Here is my first python function:这是我的第一个 python 函数:

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies1():
    url='https://www.lyricsbogie.com/category/movies/a-flat-2010'
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = bs(plain_text,"html.parser")
    for link in soup.find_all('h3',class_='entry-title'):
        href = link.a.get('href')
        href = href+"\n"
        print(href)

output of the above function:上述函数的输出:

https://www.lyricsbogie.com/movies/a-flat-2010/pyar-itna-na-kar.html
https://www.lyricsbogie.com/movies/a-flat-2010/chal-halke-halke.html
https://www.lyricsbogie.com/movies/a-flat-2010/meetha-sa-ishq.html
https://www.lyricsbogie.com/movies/a-flat-2010/dil-kashi.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html

It successfully fetches the songs url of the specified link.它成功获取了指定链接的歌曲 url。 But now when I try to automate the process and passes a file movie.txt to read url one by one and get the result but its output does not match with the function above in which I add url by myself one by one.但是现在当我尝试自动化该过程并传递一个文件movie.txt一一读取url并获得结果时,其输出与上面我自己一一添加url的函数不匹配。 Also this function does not get the songs url.此功能也不会获取歌曲网址。 Here is my function that does not work correctly.这是我的功能无法正常工作。

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies():
    file = open("movie.txt","r")
    for url in file:
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = bs(plain_text,"html.parser")
        for link in soup.find_all('h3',class_='entry-title'):
            href = link.a.get('href')
            href = href+"\n"
            print(href)

output of the above function上述函数的输出

https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html

and so on..........等等..........

By comparing 1st function output and 2nd function output.通过比较第一功能输出和第二功能输出。 You clearly see that there is no song url that function 1 fetches and also function 2 repeating the same output again and again.您清楚地看到,没有函数 1 获取的歌曲 url,函数 2 也没有一次又一次地重复相同的输出。

Can Anyone help me in that why is it happening.任何人都可以帮助我,为什么会这样。

To understand what is happening, you can print the representation of the url read from the file in the for loop:要了解发生了什么,您可以打印从for循环中的文件读取的 url 的表示:

for url in file:
    print(repr(url))
    ...

Printing this representation (and not just the string) makes it easier to see special characters.打印此表示(而不仅仅是字符串)可以更容易地查看特殊字符。 In this case, the output gave 'https://www.lyricsbogie.com/category/movies/a-flat-2010\\n' .在这种情况下,输出给出了'https://www.lyricsbogie.com/category/movies/a-flat-2010\\n' As you see, there is a line break in the url, so the fetched url is not correct.如您所见,url 中存在换行符,因此获取的 url 不正确。

Use for instance the rstrip() method to remove the newline character, by replacing url by url.rstrip() .例如使用rstrip()方法来删除换行符,通过替换urlurl.rstrip()

I have a doubt that your file is not read as a single line, to be sure, can you test this code:我怀疑您的文件不是作为一行读取的,可以肯定的是,您可以测试以下代码:

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies(url):
    print("##Getting songs from %s" % url)
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = bs(plain_text,"html.parser")
    for link in soup.find_all('h3',class_='entry-title'):
        href = link.a.get('href')
        href = href+"\n"
        print(href)

def get_urls_from_file(filename):
    with open(filename, 'r') as f:
    return [url for url in f.readlines()]

urls = get_urls_from_file("movie.txt")
for url in urls:
    get_songs_links_for_movies(url)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM