简体   繁体   English

如何从 Twitter 打印推文?

[英]How do I print tweets from twitter?

I am trying to scrape tweets from twitter for a side project.我正在尝试从 twitter 上为一个副项目抓取推文。

Having difficulty with outputs.输出困难。

Using latest version of pycharm.使用最新版本的pycharm。

import urllib
import urllib.request
from bs4 import BeautifulSoup

theurl = "https://twitter.com/search?q=ghana%20and%20jollof&src=typed_query"
thepage = urllib.request.urlopen(theurl)


soup = BeautifulSoup(thepage, "html.parser")
i = 1
for tweets in soup.findAll('div', {
    "class": "css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0"
}):
    print (i)
    print (tweets.find('span').text)
    i = i+1
    print(tweets)

I do not receive any errors at all but there no outputs for the tweets.我根本没有收到任何错误,但推文没有输出。

You should use the requests library, and also you are missing user-agent header in your request which seems to be mandatory for twitter.您应该使用 requests 库,并且您的请求中缺少 user-agent 标头,这似乎是 twitter 所必需的。

Here is a working example:这是一个工作示例:

import requests
from bs4 import BeautifulSoup

# without this you get strange reponses
headers = {
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
}

# the correct way to pass the arguments
params = (
    ('q', 'ghana and jollof'),
    ('src', 'typed_query'),
)

r = requests.get('https://twitter.com/search', headers=headers, params=params)
soup = BeautifulSoup(r.content, 'html.parser')
allTweetsContainers = soup.findAll("div", {"class": "tweet"})

print(len(allTweetsContainers))
# all that remains is to parse the tweets one by one

Problem is that this way you will load only 20 tweets per request you will need to examine the network tab and see how the browser loads the rest dynamically.问题是这样你每个请求只会加载 20 条推文,你需要检查网络选项卡,看看浏览器如何动态加载其余的。

This however is very tedious, I strongly recommend using a library that directly calls the twitter api, like https://github.com/twintproject/twint然而这很乏味,我强烈建议使用直接调用 twitter api 的库,如https://github.com/twintproject/twint

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM