[英]Create list names dynamically from for loop
我正在嘗試刮掉 Goodreads 上選擇獎上列出的書籍的書籍描述。 我正在使用以下 function 來獲取為特定類型列出的各個網址
def get_genre_url(genre):
all_links = []
for year in (range(2011,2022)):
url = 'https://www.goodreads.com/choiceawards/best-' + genre + '-books-'+ str(year)
page = requests.get(url)
soup = bs(page.content, 'html.parser')
for link in soup.find_all('a', {'class':'pollAnswer__bookLink'}):
all_links.append('https://www.goodreads.com' + link.get('href'))
return(all_links)
獲得書籍網址后,我繼續刪除這些網址以獲取書籍描述。
def get_description(genre_list):
urls = []
authors = []
titles = []
index = 0
for url in genre_list:
#print(index,url)
page = requests.get(url)
soup = bs(page.content, 'html.parser')
authors.append(soup.find('title').get_text().split(' by ')[1])
#print(index,authors)
description_df = pd.DataFrame (authors, columns = ['author'])
titles.append(soup.find('title').get_text().split(' by ')[0])
description_df['title'] = titles
if soup.find('div',{'class':'readable stacked'}) is None:
#print('This is a NoneType page:', url)
description = soup.find('div',{'class':'TruncatedText__text TruncatedText__text--5'})
else:
description = soup.find('div',{'class':'readable stacked'}).get_text()
urls.append(description)
index += 1
description_df['description'] = urls
return(description_df)
為了得到最終的 dataframe 我會打電話(例如)
mystery_thriller_list = get_genre_url('mystery-thriller')
description_myster_thriller = get_description(mystery_thriller_list)
但是,我想要將流派列表(例如genres = ['fiction', 'mystery-thriller']
)傳遞到函數中,並為 dataframe 名稱將命名的每個流派創建最終數據幀約定描述_'選定的流派'。 到目前為止,我還沒有弄明白,for 循環需要一些時間,因為它正在為每種類型的 220 本書加載信息。
您可以將所有數據幀存儲在字典中,並將鍵作為它們的流派名稱。
all_genres_descriptions = {}
genres = ['fiction', 'mystery-thriller']
for genre in genres:
genre_list = get_genre_url(genre)
description_genre = get_description(genre_list)
all_genres_descriptions[f'description_{genre}'] = description_genre
夫婦的事情。 對於測試,您不需要瀏覽所有年份和書籍。 我只看一年和前兩本書。 要做你正在尋找的東西,你可以使用 globals()。 您可能還只想創建一個 dataframe 但在每次迭代和連接中添加一列“流派”。 從長遠來看,將所有數據放在一個 dataframe 中可能會更容易。
genres = ['fiction', 'mystery-thriller']
for genre in genres:
mystery_thriller_list = get_genre_url(genre)
globals()[f"{genre.replace('-', '_')}_selected_genre"] = get_description(mystery_thriller_list)
print(fiction_selected_genre)
author title description
0 Haruki Murakami 1Q84 (1Q84 #1-3) \nThe year is 1984 and the city is Tokyo.A you...
1 Sarah Addison Allen The Peach Keeper \nThe New York Times bestselling author of The...
print(mystery_thriller_selected_genre)
author title description
0 Janet Evanovich | Goodreads Smokin' Seventeen (Stephanie Plum, #17) [[[<p><b><i>Where there’s smoke there’s fire, ...
1 J.D. Robb New York to Dallas (In Death, #33) \nTwelve years ago, Eve Dallas was just a rook...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.