将文件中的行读入字符串

Question

我想在下面的代码中替换"Song Title", "Song Artist"

find_Lyrics("Song Title", "Song Artist")

与我在两个 txt 文件中的歌曲标题和歌曲艺术家。 /artistchart.txt 的内容：

DaBaby
Jack Harlow
DJ Khaled
The Weeknd
SAINt JHN
Megan Thee Stallion
Harry Styles
DJ Khaled
Juice WRLD
Chris Brown
Lil Mosey
Jawsh 685
Juice WRLD
Lady Gaga
Harry Styles
Gabby Barrett
Dua Lipa
Post Malone
Lewis Capaldi
Lil Baby
Doja Cat
Justin Bieber
Pop Smoke
StaySolidRocky
Luke Bryan
Miranda Lambert
Dua Lipa
Future
Powfu
Trevor Daniel
Maren Morris
Pop Smoke
Sam Hunt
Roddy Ricch
Maddie & Tae
Juice WRLD
Lil Baby
Juice WRLD
Morgan Wallen
Surfaces
Rod Wave
Juice WRLD
Lil Baby
Moneybagg Yo
Drake
Megan Thee Stallion
BENEE
NLE Choppa
Juice WRLD
LOCASH
Juice WRLD
JP Saxe
Jason Aldean
Florida Georgia Line
Pop Smoke
Chris Janson
Doja Cat
Ariana Grande
Thomas Rhett
Young T
Marshmello
Juice WRLD
Black Eyed Peas
Juice WRLD
Kane Brown
Saweetie
Keith Urban
Juice WRLD
Lee Brice
Pop Smoke
Justin Moore
Luke Combs
Kane Brown
THE SCOTTS
Pop Smoke
Migos
Juice WRLD
Juice WRLD
Juice WRLD
Morgan Wallen
Jhene Aiko
Don Toliver
Trevor Daniel
surf mesa
Rod Wave
HARDY
Lil Durk
Luke Combs
Juice WRLD
AJR
Ashley McBryde
Juice WRLD
Drake
Polo G
Juice WRLD
Gunna
Topic
Pop Smoke
Parker McCollum
J. Cole

和 /songchart.txt 的内容：

Rockstar
Whats Poppin
Popstar
Blinding Lights
Roses
Savage
Watermelon Sugar
Greece
Come & Go
Go Crazy
Blueberry Faygo
Savage Love
Wishing Well
Rain On Me
Adore You
I Hope
Break My Heart
Circles
Before You Go
We Paid
Say So
Intentions
For The Night
Party Girl
One Margarita
Bluebird
Dont Start Now
Life Is Good
Death Bed
Falling
The Bones
The Woo
Hard To Forget
The Box
Die From A Broken Heart
Hate The Other Side
The Bigger Picture
Conversations
Chasin You
Sunday Best
Rags2Riches
Lifes A Mess
Emotionally Scarred
Said Sum
Toosie Slide
Girls In The Hood
Supalonely
Walk Em Down
Blood On My Jeans
One Big Country Song
Righteous
If The World Was Ending
Got What I Got
I Love My Country
Got It On Me
Done
Like That
Stuck With U
Be A Light
Dont Rush
Be Kind
Titanic
Mamacita
Stay High
Be Like That
Tap In
God Whispered Your Name
Bad Energy
One Of Them Girls
Mood Swings
Why We Drink
Lovin On You
Cool Again
The Scotts
Something Special
Need It
Tell Me U Luv Me
Up Up And Away
Fighting Demons
More Than My Hometown
B.S.
After Party
Past Life
ily
Girl Of My Dreams
One Beer
3 Headed Goat
Does To Me
Man Of The Year
Bang!
One Night Standards
Cant Die
Chicago Freestyle
Flex
Screw Juice
Dollaz On My Head
Breaking Me
Enjoy Yourself
Pretty Heart
The Climb Back

这是我的代码：

import requests
from bs4 import BeautifulSoup as Parse


def make_soup(url):
    """
    Parse a web page info html
     """
    user_agent = {
        'User-Agent': "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
    }
    r = requests.get(url, headers=user_agent)
    html = Parse(r.content, "html.parser")
    return html


def format_url(string):
    """
    Replace les spaces with '%20'
    """
    return string.replace(" ", "%20")


def get_song_url(html):
    song_url = html.find("a", {"class": "title"})["href"]
    return song_url


def find_Lyrics(titre, artiste):
    url = f"https://www.musixmatch.com/fr/search/{artiste}%20{titre}/tracks"

    url = format_url(url)
    pageweb = make_soup(url)

    # Recupere le lien de la chanson
    song_url = pageweb.find("a", {"class": "title"})["href"]
    song_url = "https://www.musixmatch.com" + song_url


# Recupere les paroles
    pageweb = make_soup(song_url)
    paroles = list()
    for span in pageweb.find_all("span", {"class": "lyrics__content__ok"}):
        # open file and print to it
        file1 = open('newlyrics.txt', 'a')
    print(span.text, file=file1)


filepath1 = '/home/redapemusic35/VimWiki/subjects/projects/tutorial/songchart.txt'
filepath2 = '/home/redapemusic35/VimWiki/subjects/projects/tutorial/artistchart.txt'

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

如果我将输入文件减少到仅前几个项目，则代码将按我的意愿工作。 但是，如果我尝试运行整个 txt 文件，我会收到错误消息：

回溯（最后一次调用）：文件“tutorial/spiders/musicmatchapi2.py”，第 54 行，在 find_Lyrics(song.strip(), artist.strip()) 文件“tutorial/spiders/musicmatchapi2.py”，第 46 行, 在 find_Lyrics print(span.text, file=file1) UnboundLocalError: Local variable 'span' referenced before assignment

我非常肯定错误存在于我的两个输入文件之一中，因为当我运行它时，当每个列表中只有少数艺术家和歌曲时，代码工作正常。 但我不认为这是由于其中一首歌曲与其中一位艺术家不匹配造成的，因为当这种情况发生时我会遇到不同的错误。

有什么方法可以找到导致错误的原因而无需单独运行每个艺术家和歌曲组合？

Answer 1

将文件 object 传递到find_Lyrics() function 是问题所在。 所以我所做的只是同时打开这两个文件，逐行读取并将字符串传递到 function。

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

所以你的刮刀看起来像，

import requests
from bs4 import BeautifulSoup as Parse


def make_soup(url):
    """
    Parse a web page info html
     """
    user_agent = {
        'User-Agent': "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
    }
    r = requests.get(url, headers=user_agent)
    html = Parse(r.content, "html.parser")
    return html


def format_url(string):
    """
    Replace les spaces with '%20'
    """
    return string.replace(" ", "%20")


def get_song_url(html):
    song_url = html.find("a", {"class": "title"})["href"]
    return song_url


def find_Lyrics(titre, artiste):
    url = f"https://www.musixmatch.com/fr/search/{artiste}%20{titre}/tracks"

    url = format_url(url)
    pageweb = make_soup(url)

    # Recupere le lien de la chanson
    song_url = pageweb.find("a", {"class": "title"})["href"]
    song_url = "https://www.musixmatch.com" + song_url


# Recupere les paroles
    pageweb = make_soup(song_url)
    paroles = list()
    for span in pageweb.find_all("span", {"class": "lyrics__content__ok"}):
        # open file and print to it
        file1 = open('newlyrics.txt', 'a')
    print(span.text, file=file1)


filepath1 = 'countrysongs.txt'
filepath2 = 'countryartists.txt'

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        find_Lyrics(song.strip(), artist.strip())

希望它能给出预期的 output。

更新

似乎艺术家和歌曲列表的内容与网站上使用的歌曲和艺术家名称不兼容。 所以你需要更新你的列表，或者你可以处理异常，这样程序就不会终止。

注意：这是一个临时解决方案和一个基本的异常处理。 因此，您需要手动更新您的列表，或者可以编写一个程序从网站上刮取正确的名称。

with open(filepath1) as fb, open(filepath2) as hp:
    for song, artist in zip(fb, hp):
        try:
            find_Lyrics(song.strip(), artist.strip())
        except:
            print("URL Not Found")

Answer 2

减少复制粘贴。 这里真正的问题是您如何将信息传递到您的 find_lyrics function。

artistlist = "artists.txt"
songlist = "songs.txt"
artists = []
with open(artistlist) as al:
   artists = [a.strip() for a in al]
songs = []
with open(songlist) as sl:
   songs = [s.strip() for s in sl]

tuples = [(songs[i], artists[i]) for i in range(0, len(artists))]
# tuples = list(zip(songs, artists))

for row in tuples:
  find_lyrics(*row)

将文件中的行读入字符串

问题描述

2 个解决方案

解决方案1
1 2020-07-31 17:12:02

解决方案2
1 2020-07-31 17:17:57

将文件中的行读入字符串

问题描述

2 个解决方案

解决方案1 1 2020-07-31 17:12:02

解决方案2 1 2020-07-31 17:17:57

解决方案1
1 2020-07-31 17:12:02

解决方案2
1 2020-07-31 17:17:57