简体   繁体   中英

JSON to R for Data Mining

I am trying to grab tweets using the Topsy Otter api, so I can perform some data mining on it for my dissertation.

So far, I have got:

library(RJSONIO)
library(RCurl)
tweet_data <- getURL("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx")
fromJSON(tweet_data)

Which works fine. Now however, I want to return just a couple details from this file, 'content' and 'trackback_date'. I cannot seem to figure out how - I have tried cobbling a couple of examples together, but unable to extract what I want.

Here is what I've tried so far:

trackback_date <- lapply(tweet_data$result, function(x){x$trackback_date})

content <- lapply(tweet_data$result, function(x){x$content})

Any help would be greatly appreciated, thank you.

edit I have also tried:

library("rjson")
# use rjson

tweet_data <- fromJSON(paste(readLines("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx"), collapse=""))
# get a data from Topsy Otter API
# convert JSON data into R object using fromJSON()

trackback_date <- lapply(tweet_data$result, function(x){x$trackback_date})

content <- lapply(tweet_data$result, function(x){x$content})

Basic processing of Topsy Otter API response:

library(RJSONIO)
library(RCurl)
tweet_data <- getURL("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&perpage=10&maxtime=1304226000&apikey=xxx")

#
# Addition to your code
#
tweets <- fromJSON(tweet_data)$response$list
content <- sapply(tweets, function(x) x$content)
trackback_date <- sapply(tweets, function(x) x$trackback_date)

EDIT: Processing multiple pages

Function gets 100 items from specified page :

pagetweets <- function(page){
  url <- paste("http://otter.topsy.com/search.json?q=PSN&mintime=1301634000&page=",page,
               "&perpage=100&maxtime=1304226000&apikey=xxx",
               collapse="", sep="")
  tweet_data <- getURL(url)
  fromJSON(tweet_data)$response$list
}

Now we can apply it to multiple pages:

tweets <- unlist(lapply(1:10, pagetweets), recursive=F)

And, voila, this code:

content <- sapply(tweets, function(x) x$content)
trackback_date <- sapply(tweets, function(x) x$trackback_date)

returns you 1000 records.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM