简体   繁体   中英

Can't access a tweet id with beautiful soup

My goal is to retrieve the ids of tweets in a twitter search as they are being posted. My code so far looks like this:

import requests
from bs4 import BeautifulSoup

keys = some_key_words + " -filter:retweets AND -filter:replies"
query = "https://twitter.com/search?f=tweets&vertical=default&q=" + keys + "&src=typd&lang=es"
req = requests.get(query).text
soup = BeautifulSoup(req, "lxml")

for tweets in soup.findAll("li",{"class":"js-stream-item stream-item stream-item"}):
    print(tweets)

However, this doesn't return anything. Is there a problem with the code itself or am I looking at the wrong place of the source code? I understand that the ids should be stored here:

<div class="stream">
  <ol class="stream-items js-navigable-stream" id="stream-items-id">
    <li class="js-stream-item stream-item stream-item" **data-item-id**="1210306781806833664" id="stream-item-tweet-1210306781806833664" data-item-type="tweet">
from bs4 import BeautifulSoup
data = """
<div class="stream">
    <ol class="stream-items js-navigable-stream" id="stream-items-id">
        <li class="js-stream-item stream-item stream-item
" **data-item-id**="1210306781806833664"
id="stream-item-tweet-1210306781806833664"
data-item-type="tweet"
>
        ...
"""


soup = BeautifulSoup(data, 'html.parser')

for item in soup.findAll("li", {'class': 'js-stream-item stream-item stream-item'}):
    print(item.get("**data-item-id**"))

Output:

1210306781806833664

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM