简体   繁体   中英

get value of atribute using Selenium python

i'm trying to get property of a tweet "data-reply-to-users-json". but it seems not working, any suggestion? I put my code and html structure of twitter. *ps: Twitter search using js when load a more tweet

Twitter Structure

Below is what i already try in python

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = r"C:\Users\..\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://twitter.com/search?q=bakar%20para%20cebong&src=typd")
time.sleep(1)

body = driver.find_element_by_tag_name('body')

for _ in range(5):
    body.send_keys(Keys.PAGE_DOWN)
time.sleep(0.2)

time.sleep(1)
tweets = driver.find_elements_by_class_name('original-tweet')

for tweet in tweets:
    print(tweet.get_property("data-tweet-id"))

An alternative could be to use Tweepy . It's easy to use. You need a twitter account. Create an app request, get access key and id( this might take a while). This is a more legit way to do it. And second selenium is slow for scrapping data on twitter.

import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret) # you will get this once you register for the app
auth.set_access_token(access_token, access_token_secret) # you will get this once you register for the app

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print tweet.text 

You will be far better off using a library such as BeautifulSoup for this task.

But if you must use selenium then you need the get_attribute("attribute name") function.

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = r"C:\Users\..\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://twitter.com/search?q=bakar%20para%20cebong&src=typd")
time.sleep(1)

body = driver.find_element_by_tag_name('body')

for _ in range(5):
    body.send_keys(Keys.PAGE_DOWN)
time.sleep(0.2)

time.sleep(1)
tweets = driver.find_elements_by_class_name('original-tweet')

for tweet in tweets:
    tweet_id = tweet.get_property("data-tweet-id")
    reply_to_users_json = tweets.get_attribute("data-reply-to-users-json") # Added this line
    print(some_stuff_like_tweet_id_or_reply_to_users_json)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM