简体   繁体   English

如何从网站 Python 中的所有链接中提取评论

[英]How can I extract comments from all the links in a website Python

I am trying to extract comments from few forums from the website.我试图从网站上的几个论坛中提取评论。 I have list of links from which I want the comments to be extracted.我有我希望从中提取评论的链接列表。 When I give single link instead of {i} in the code (f"{i}/index{item}/")) the code works fine, but with the below code it is giving an empty list.当我在代码 (f"{i}/index{item}/")) 中给出单个链接而不是 {i} 时,代码工作正常,但使用下面的代码,它给出了一个空列表。

data数据

    name                    Link
    a               https://www.f150forum.com/f118/2019-adding-ada...
    b               https://www.f150forum.com/f118/2018-adding-ada...
    c               https://www.f150forum.com/f118/adaptive-cruise...
    d               https://www.f150forum.com/f118/2018-platinum-s...
    e               https://www.f150forum.com/f118/adaptive-cruise...
    f               https://www.f150forum.com/f118/adaptive-cruise...

My code我的代码

link_url = []
username=[]
comments = []

for i in df['Link']:
    with requests.Session() as req:
        for item in range(1):
            r = req.get(
            f"{i}/index{item}/")
            soup = BeautifulSoup(r.text, 'html.parser')
            link_url.append(item)
            for item in soup.findAll('div',attrs={"class":"ism-true"}):
                result = [item.get_text(strip=True, separator=" ")]
                comments.append(result)
            for item in soup.findAll('a',attrs={"class":"bigusername"}):
                name = [item.get_text(strip=True, separator=" ")]
                username.append(name)


Can you please help me with this.你能帮我解决这个问题吗? Thank you in advance.先感谢您。

OK , I see your links are in a dataframe, you can loop them with :好的,我看到您的链接在数据框中,您可以使用以下命令循环它们:

import pandas as pd
from io import StringIO

data = """
name,Link
a,https://www.f150forum.com/f118/2019-adding-ada...
b,https://www.f150forum.com/f118/2018-adding-ada...
c,https://www.f150forum.com/f118/adaptive-cruise...
d,https://www.f150forum.com/f118/2018-platinum-s...
e,https://www.f150forum.com/f118/adaptive-cruise...
"""
df = pd.read_csv(StringIO(data),sep=',')
for index, row in df.iterrows():
  print(row['Link'])

result :结果 :

https://www.f150forum.com/f118/2019-adding-ada...
https://www.f150forum.com/f118/2018-adding-ada...
https://www.f150forum.com/f118/adaptive-cruise...
https://www.f150forum.com/f118/2018-platinum-s...
https://www.f150forum.com/f118/adaptive-cruise...

then , put the value(link) inside of your requests然后,将值(链接)放在您的请求中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM