简体   繁体   中英

How to get the substring (photo url) from the snscrape?

Edit, since I realize it also has the vedio url, My question is how can I only get the photo url in the following loop? I want to add a attribute called photourl which is the full url from the media.

import snscrape.modules.twitter as sntwitter
import pandas as pd

# Creating list to append tweet data to
attributes_container = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
    if i>150:
        break
    attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content, tweet.media])
    
# Creating a dataframe to load the list
tweets_df = pd.DataFrame(attributes_container, columns=["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet","media"])

When I used the snscrape to scrape tweet from the twitter, I want to filter the photo image from the photo graph. I get the media object like the following:

media=[Photo(previewUrl='https://pbs.twimg.com/media/FePrYL7WQAQDKEB?format=jpg, fullUrl='https://pbs.twimg.com/media/FePrYL7WQAQDKEB?format=jpg&name=large')]

So How can I just get the PreviewUrl'https://pbs.twimg.com/media/FePrYL7WQAQDKEB?format=jpg, and full url sperately',

use the python code?

Thanks

you can change your for loop to:

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('sex for grades since:2021-07-05 until:2022-07-06').get_items()):
    if i>150:
        break
    try:
      tweetMedia = tweet.media[0].fullUrl # .previewUrl if you want previewUrl
    except:
      tweetMedia = tweet.media # or None or '' or any default value 
    attributes_container.append([tweet.user.username, tweet.date, tweet.likeCount, tweet.sourceLabel, tweet.content, tweetMedia])

and then you'll have the urls [if there are any] for each tweet row.

If you want it all inside the append statement, you can just change that to:

attributes_container.append([
    tweet.user.username, tweet.date, tweet.likeCount, 
    tweet.sourceLabel, tweet.content, 
        (tweet.media[0].fullUrl if tweet.media 
        and hasattr(tweet.media[0], 'fullUrl')
        else tweet.media)
])

[instead of adding the try...except ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM