I am new to R and I don't want to misunderstand the language and its data structure from the beginning on. :)
My data.frame sample.data
contains beside 'normal' attributes (eg author
) another, nested list of data.frame ( files
), which has eg the attributes extension
.
How can I filter for authors who have created files with a certain extension? Is there a R-ic way of doing that? Maybe in this direction:
t <- subset(data, data$files[['extension']] > '.R')
Actually I want to avoid for loops.
Here you can find some sample data:
d1 <- data.frame(extension=c('.py', '.py', '.c++')) # and some other attributes
d2 <- data.frame(extension=c('.R', '.py')) # and some other attributes
sample.data <- data.frame(author=c('author_1', 'author_2'), files=I(list(d1, d2)))
The JSON the sample.data comes from looks like
[
{
"author": "author_1",
"files": [
{
"extension": ".py",
"path": "/a/path/somewhere/"
},
{
"extension": ".c++",
"path": "/a/path/somewhere/else/"
}, ...
]
}, ...
]
There are at least a dozen ways of doing this, but if you want to learn R right, you should learn the standard ways of subsetting data structures, especially atomic vectors, lists and data frames. This is covered in chapter two of this book:
There are other great books, but this is a good one, and it is online and free.
UPDATE: Okay, this converts your json to a list of data frames.
library("rjson")
s <- paste(c(
'[{' ,
' "author": "author_1",',
' "files": [',
' {',
' "extension": ".py",',
' "path": "/a/path/somewhere/"',
' },',
' {',
' "extension": ".c++",',
' "path": "/a/path/somewhere/else/"',
' }]',
'},',
'{',
'"author": "author_2",',
'"files": [',
' {',
' "extension": ".py",',
' "path": "/b/path/somewhere/"',
' },',
' {',
' "extension": ".c++",',
' "path": "/b/path/somewhere/else/"',
' }]',
'}]'),collapse="")
j <- fromJSON(s)
todf <- function (x) {
nrow <- length(x$files)
vext <- sapply(x$files,function (y) y[[1]])
vpath <- sapply(x$files,function (y) y[[2]])
df <- data.frame(author=rep(x$author,nrow),ext=vext,path=vpath)
}
listdf <- lapply(j,todf)
listdf
Which yields:
[[1]]
author ext path
1 author_1 .py /a/path/somewhere/
2 author_1 .c++ /a/path/somewhere/else/
[[2]]
author ext path
1 author_2 .py /b/path/somewhere/
2 author_2 .c++ /b/path/somewhere/else/
And to finish the task, merge and subset:
mdf <- do.call("rbind", listdf)
mdf[ mdf$ext==".py", ]
yielding:
author ext path
1 author_1 .py /a/path/somewhere/
3 author_2 .py /b/path/somewhere/
有趣的是,没有多少人使用R来模拟分层数据库!
subset(sample.data, sapply(files, function(df) any(df$extension == ".R")))
Assuming your data frame df
, as a CSV, looks like:
author,path,extension
john,/home/john,txt
mary,/home/mary,png
then the easiest solution is to use the dplyr package:
library(dplyr)
filter(df, author=="john" & extension=="txt")
I guess grep()
function in base
package could be your solution:
files <- data.frame(path = paste0("path", 1:3), extension = c (".R", ".csv", ".R")
, creation.date = c(Sys.Date()+1:3))
> files
# path extension creation.date
# 1 path1 .R 2015-07-15
# 2 path2 .csv 2015-07-16
# 3 path3 .R 2015-07-17
> files[grep(".R", files$extension),]
# extension creation.date
# 1 path1 .R 2015-07-15
# 3 path3 .R 2015-07-17
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.