简体   繁体   English

R中的数据框像SQL一样使用,可能使用sqldf()

[英]Data Frame in R use like SQL, possibly using sqldf()

I'm not at all familiar with R. I have basic competency using MATLAB which I do like a lot. 我对R一点都不熟悉。我使用MATLAB有很多基本能力,我很喜欢。

My current task involves gathering nice statistical data and analyzing it. 我目前的任务是收集好的统计数据并对其进行分析。 I have scraped my data from JSON into a data frame by using fromJSON in the RJSONIO lib. 我通过在RJSONIO lib中使用fromJSON将我的数据从JSON中删除到数据框中。 Then I removed NULL values with 然后我删除了NULL

  Stats <- lapply(Stats, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)})

then called 然后叫

DF<-do.call("rbind", Stats) to get the JSON values packed into a data frame. DF<-do.call("rbind", Stats)将JSON值打包到数据框中。 This left me with full atomic vectors, so this was cleaned up using 这给我留下了完整的原子矢量,因此使用它进行了清理

DF<-as.data.frame(DF)

Now I'm left with a DF where I'd like to be able to perform query-esque (SQL) calculations, ie.. My research has led me to a library called sqldf, but it seems to have many dependencies that I can't get running on my machine. 现在我留下了DF,我希望能够执行查询式 (SQL)计算,即..我的研究让我进入了一个名为sqldf的库,但它似乎有很多依赖,我可以我的机器上运行了。 Still looking for a solution. 仍在寻找解决方案。

List all people who are on team "NYI"

or 要么

Find total number of goals for team

or even most helpful 甚至最有帮助的

Lookup data by name or other key value

Sample data frame: 样本数据框:

_ Name Team Opponent 167 Matt Carkner NYI PHI 168 Keith Ballard MIN FLA 169 Willie Mitchell FLA MIN 170 Rob Scuderi PIT BOS 171 Nate Prosser MIN FLA 172 Nick Schultz PHI NYI

Suggest data.table package. 建议data.table包。

# toy data
# I presume your first column of data is number of "goals"
df <- structure(list(Goals = 167:172, Name = structure(c(2L, 1L, 6L, 
5L, 3L, 4L), .Label = c("Keith Ballard", "Matt Carkner", "Nate Prosser", 
"Nick Schultz", "Rob Scuderi", "Willie Mitchell"), class = "factor"), 
    Team = structure(c(3L, 2L, 1L, 5L, 2L, 4L), .Label = c("FLA", 
    "MIN", "NYI", "PHI", "PIT"), class = "factor"), Opponent = structure(c(5L, 
    2L, 3L, 1L, 2L, 4L), .Label = c("BOS", "FLA", "MIN", "NYI", 
    "PHI"), class = "factor")), .Names = c("Goals", "Name", "Team", 
"Opponent"), class = c("data.table", "data.frame"), row.names = c(NA, 
-6L), .internal.selfref = <pointer: 0x001924a0>, index = structure(integer(0), "`__Team`" = c(3L, 
2L, 5L, 1L, 6L, 4L)))

illustrations on a few operations 一些操作的插图

setDT(df)  # convert data frame to data.table
df[Team == "NYI",]   # there's faster approach by first keying the concerned column
# you get
   Goals         Name Team Opponent
1:   167 Matt Carkner  NYI      PHI

df[,list(ttl_goals = sum(Goals)), by=Name]  # to get total goals per player
# you get
              Name ttl_goals
1:    Matt Carkner       167
2:   Keith Ballard       168
3: Willie Mitchell       169
4:     Rob Scuderi       170
5:    Nate Prosser       171
6:    Nick Schultz       172

SQLDF turned out to be the PERFECT solution to my type of question. SQLDF原来是我的问题类型的完美解决方案。

I simply installed the package from the R console 我只是从R控制台安装了包

install.packages("sqldf")

Since I'm running on an Apple system, some Xcode dependencies needed installing, but Software Update took care of that for me autonomously. 由于我在Apple系统上运行,因此需要安装一些Xcode依赖项,但是Software Update会自动为我处理这些问题。

Then I simply used the 然后我简单地使用了

library(sqldf)

command to begin running SQL queries on my data frames. 命令开始在我的数据帧上运行SQL查询。

Use is very much so similar to standard SQL, return is new DF with requested attributes. 使用与标准SQL非常相似,返回的是具有请求属性的新DF。 Wrapped up in a view tag makes it a perfect tool now. 包含在视图标签中使其成为一个完美的工具。

View(sqldf("SELECT * from skaters WHERE Team='PHI'"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM