简体   繁体   中英

Same python function giving different output

I am making a scraping script in python. I first collect the links of the movie from where I have to scrap the songs list. Here is the movie.txt list containing movies link

https://www.lyricsbogie.com/category/movies/a-flat-2010 https://www.lyricsbogie.com/category/movies/a-night-in-calcutta-1970 https://www.lyricsbogie.com/category/movies/a-scandall-2016 https://www.lyricsbogie.com/category/movies/a-strange-love-story-2011 https://www.lyricsbogie.com/category/movies/a-sublime-love-story-barsaat-2005 https://www.lyricsbogie.com/category/movies/a-wednesday-2008 https://www.lyricsbogie.com/category/movies/aa-ab-laut-chalen-1999https://www.lyricsbogie.com/category/movies/aa-dekhen-zara-2009 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1973 https://www.lyricsbogie.com/category/movies/aa-gale-lag-jaa-1994 https://www.lyricsbogie.com/category/movies/aabra-ka-daabra-2004 https://www.lyricsbogie.com/category/movies/aabroo-1943 https://www.lyricsbogie.com/category/movies/aabroo-1956 https://www.lyricsbogie.com/category/movies/aabroo-1968 https://www.lyricsbogie.com/category/movies/aabshar-1953

Here is my first python function:

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies1():
    url='https://www.lyricsbogie.com/category/movies/a-flat-2010'
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = bs(plain_text,"html.parser")
    for link in soup.find_all('h3',class_='entry-title'):
        href = link.a.get('href')
        href = href+"\n"
        print(href)

output of the above function:

https://www.lyricsbogie.com/movies/a-flat-2010/pyar-itna-na-kar.html
https://www.lyricsbogie.com/movies/a-flat-2010/chal-halke-halke.html
https://www.lyricsbogie.com/movies/a-flat-2010/meetha-sa-ishq.html
https://www.lyricsbogie.com/movies/a-flat-2010/dil-kashi.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html

It successfully fetches the songs url of the specified link. But now when I try to automate the process and passes a file movie.txt to read url one by one and get the result but its output does not match with the function above in which I add url by myself one by one. Also this function does not get the songs url. Here is my function that does not work correctly.

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies():
    file = open("movie.txt","r")
    for url in file:
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = bs(plain_text,"html.parser")
        for link in soup.find_all('h3',class_='entry-title'):
            href = link.a.get('href')
            href = href+"\n"
            print(href)

output of the above function

https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html
https://www.lyricsbogie.com/movies/ae-dil-hai-mushkil-2016/ae-dil-hai-mushkil-title.html
https://www.lyricsbogie.com/movies/m-s-dhoni-the-untold-story-2016/kaun-tujhe.html
https://www.lyricsbogie.com/movies/raaz-reboot-2016/raaz-aankhein-teri.html
https://www.lyricsbogie.com/albums/akira-2016/baadal-2.html
https://www.lyricsbogie.com/movies/baar-baar-dekho-2016/sau-aasmaan.html
https://www.lyricsbogie.com/albums/gajanan-2016/gajanan-title.html
https://www.lyricsbogie.com/movies/days-of-tafree-2016/jeeley-yeh-lamhe.html
https://www.lyricsbogie.com/tv-shows/coke-studio-pakistan-season-9-2016/ala-baali.html
https://www.lyricsbogie.com/albums/piya-2016/piya-title.html
https://www.lyricsbogie.com/albums/sach-te-supna-2016/sach-te-supna-title.html

and so on..........

By comparing 1st function output and 2nd function output. You clearly see that there is no song url that function 1 fetches and also function 2 repeating the same output again and again.

Can Anyone help me in that why is it happening.

To understand what is happening, you can print the representation of the url read from the file in the for loop:

for url in file:
    print(repr(url))
    ...

Printing this representation (and not just the string) makes it easier to see special characters. In this case, the output gave 'https://www.lyricsbogie.com/category/movies/a-flat-2010\\n' . As you see, there is a line break in the url, so the fetched url is not correct.

Use for instance the rstrip() method to remove the newline character, by replacing url by url.rstrip() .

I have a doubt that your file is not read as a single line, to be sure, can you test this code:

import requests
from bs4 import BeautifulSoup as bs

def get_songs_links_for_movies(url):
    print("##Getting songs from %s" % url)
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = bs(plain_text,"html.parser")
    for link in soup.find_all('h3',class_='entry-title'):
        href = link.a.get('href')
        href = href+"\n"
        print(href)

def get_urls_from_file(filename):
    with open(filename, 'r') as f:
    return [url for url in f.readlines()]

urls = get_urls_from_file("movie.txt")
for url in urls:
    get_songs_links_for_movies(url)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM