read line in file into string

Question

I want to replace "Song Title", "Song Artist" in the code below

find_Lyrics("Song Title", "Song Artist")

with the song titles and song artists that I have in two txt files. The contents of /artistchart.txt:

DaBaby
Jack Harlow
DJ Khaled
The Weeknd
SAINt JHN
Megan Thee Stallion
Harry Styles
DJ Khaled
Juice WRLD
Chris Brown
Lil Mosey
Jawsh 685
Juice WRLD
Lady Gaga
Harry Styles
Gabby Barrett
Dua Lipa
Post Malone
Lewis Capaldi
Lil Baby
Doja Cat
Justin Bieber
Pop Smoke
StaySolidRocky
Luke Bryan
Miranda Lambert
Dua Lipa
Future
Powfu
Trevor Daniel
Maren Morris
Pop Smoke
Sam Hunt
Roddy Ricch
Maddie & Tae
Juice WRLD
Lil Baby
Juice WRLD
Morgan Wallen
Surfaces
Rod Wave
Juice WRLD
Lil Baby
Moneybagg Yo
Drake
Megan Thee Stallion
BENEE
NLE Choppa
Juice WRLD
LOCASH
Juice WRLD
JP Saxe
Jason Aldean
Florida Georgia Line
Pop Smoke
Chris Janson
Doja Cat
Ariana Grande
Thomas Rhett
Young T
Marshmello
Juice WRLD
Black Eyed Peas
Juice WRLD
Kane Brown
Saweetie
Keith Urban
Juice WRLD
Lee Brice
Pop Smoke
Justin Moore
Luke Combs
Kane Brown
THE SCOTTS
Pop Smoke
Migos
Juice WRLD
Juice WRLD
Juice WRLD
Morgan Wallen
Jhene Aiko
Don Toliver
Trevor Daniel
surf mesa
Rod Wave
HARDY
Lil Durk
Luke Combs
Juice WRLD
AJR
Ashley McBryde
Juice WRLD
Drake
Polo G
Juice WRLD
Gunna
Topic
Pop Smoke
Parker McCollum
J. Cole

and the contents of /songchart.txt:

Rockstar
Whats Poppin
Popstar
Blinding Lights
Roses
Savage
Watermelon Sugar
Greece
Come & Go
Go Crazy
Blueberry Faygo
Savage Love
Wishing Well
Rain On Me
Adore You
I Hope
Break My Heart
Circles
Before You Go
We Paid
Say So
Intentions
For The Night
Party Girl
One Margarita
Bluebird
Dont Start Now
Life Is Good
Death Bed
Falling
The Bones
The Woo
Hard To Forget
The Box
Die From A Broken Heart
Hate The Other Side
The Bigger Picture
Conversations
Chasin You
Sunday Best
Rags2Riches
Lifes A Mess
Emotionally Scarred
Said Sum
Toosie Slide
Girls In The Hood
Supalonely
Walk Em Down
Blood On My Jeans
One Big Country Song
Righteous
If The World Was Ending
Got What I Got
I Love My Country
Got It On Me
Done
Like That
Stuck With U
Be A Light
Dont Rush
Be Kind
Titanic
Mamacita
Stay High
Be Like That
Tap In
God Whispered Your Name
Bad Energy
One Of Them Girls
Mood Swings
Why We Drink
Lovin On You
Cool Again
The Scotts
Something Special
Need It
Tell Me U Luv Me
Up Up And Away
Fighting Demons
More Than My Hometown
B.S.
After Party
Past Life
ily
Girl Of My Dreams
One Beer
3 Headed Goat
Does To Me
Man Of The Year
Bang!
One Night Standards
Cant Die
Chicago Freestyle
Flex
Screw Juice
Dollaz On My Head
Breaking Me
Enjoy Yourself
Pretty Heart
The Climb Back

This is my code:

import requests
from bs4 import BeautifulSoup as Parse


def make_soup(url):
    """
    Parse a web page info html
     """
    user_agent = {
        'User-Agent': "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
    }
    r = requests.get(url, headers=user_agent)
    html = Parse(r.content, "html.parser")
    return html


def format_url(string):
    """
    Replace les spaces with '%20'
    """
    return string.replace(" ", "%20")


def get_song_url(html):
    song_url = html.find("a", {"class": "title"})["href"]
    return song_url


def find_Lyrics(titre, artiste):
    url = f"https://www.musixmatch.com/fr/search/{artiste}%20{titre}/tracks"

    url = format_url(url)
    pageweb = make_soup(url)

    # Recupere le lien de la chanson
    song_url = pageweb.find("a", {"class": "title"})["href"]
    song_url = "https://www.musixmatch.com" + song_url


# Recupere les paroles
    pageweb = make_soup(song_url)
    paroles = list()
    for span in pageweb.find_all("span", {"class": "lyrics__content__ok"}):
        # open file and print to it
        file1 = open('newlyrics.txt', 'a')
    print(span.text, file=file1)


filepath1 = '/home/redapemusic35/VimWiki/subjects/projects/tutorial/songchart.txt'
filepath2 = '/home/redapemusic35/VimWiki/subjects/projects/tutorial/artistchart.txt'

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

If I reduce the input files to only the first few items, the code works as I want it to. However, if I try to run the both the entire txt files, I get the error:

Traceback (most recent call last): File "tutorial/spiders/musicmatchapi2.py", line 54, in find_Lyrics(song.strip(), artist.strip()) File "tutorial/spiders/musicmatchapi2.py", line 46, in find_Lyrics print(span.text, file=file1) UnboundLocalError: Local variable 'span' referenced before assignment

I am quite positive that the error exists somewhere in one of my two input files because the code works fine when I run it when there are only a few artists and songs in each list. But I do not think that it is being caused by one of the songs not matching up with one of the artists because I get a different error when this happens.

What is a way that I can find what is causing the error without having to run each artist and song combination separately?

Answer 1

Passing the file object into the find_Lyrics() function is the problem. So all I did was open the two files simultaneously, read line by line and pass the strings into the function.

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

So your scraper would look like,

import requests
from bs4 import BeautifulSoup as Parse


def make_soup(url):
    """
    Parse a web page info html
     """
    user_agent = {
        'User-Agent': "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
    }
    r = requests.get(url, headers=user_agent)
    html = Parse(r.content, "html.parser")
    return html


def format_url(string):
    """
    Replace les spaces with '%20'
    """
    return string.replace(" ", "%20")


def get_song_url(html):
    song_url = html.find("a", {"class": "title"})["href"]
    return song_url


def find_Lyrics(titre, artiste):
    url = f"https://www.musixmatch.com/fr/search/{artiste}%20{titre}/tracks"

    url = format_url(url)
    pageweb = make_soup(url)

    # Recupere le lien de la chanson
    song_url = pageweb.find("a", {"class": "title"})["href"]
    song_url = "https://www.musixmatch.com" + song_url


# Recupere les paroles
    pageweb = make_soup(song_url)
    paroles = list()
    for span in pageweb.find_all("span", {"class": "lyrics__content__ok"}):
        # open file and print to it
        file1 = open('newlyrics.txt', 'a')
    print(span.text, file=file1)


filepath1 = 'countrysongs.txt'
filepath2 = 'countryartists.txt'

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

Hope it gives the expected output.

Update

It seems the contents of the artist and song list are not compatible with the song and artist name used on the website. So you need to update your lists or you can handle the exception so the program won't terminate.

Note: This is a temporary solution and a basic exception handling. So you need to update your lists manually or can write a program to scrape correct names from the website.

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        try:
            find_Lyrics(song.strip(), artist.strip())
        except:
            print("URL Not Found")

Answer 2

To cut down on copy-pasting. The real issue here is how you are passing info into your find_lyrics function.

artistlist = "artists.txt"
songlist = "songs.txt"
artists = []
with open(artistlist) as al:
   artists = [a.strip() for a in al]
songs = []
with open(songlist) as sl:
   songs = [s.strip() for s in sl]

tuples = [(songs[i], artists[i]) for i in range(0, len(artists))]
# tuples = list(zip(songs, artists))

for row in tuples:
  find_lyrics(*row)

read line in file into string

Question

2 answers

solution1
1 2020-07-31 17:12:02

solution2
1 2020-07-31 17:17:57

read line in file into string

Question

2 answers

solution1 1 2020-07-31 17:12:02

solution2 1 2020-07-31 17:17:57

solution1
1 2020-07-31 17:12:02

solution2
1 2020-07-31 17:17:57