简体   繁体   English

从符合特定条件的数据框中提取所有行

[英]Extract all rows from data frame matching a certain condition

I have a data frame in R, in which one of the columns contains state abbreviations like 'AL','MD' etc.我在 R 中有一个数据框,其中一列包含 state 缩写,如“AL”、“MD”等。

Say I wanted to extract the data for state = 'AL', then the following condition dataframe['AL',] only seems to return one row, whereas there are multiple rows against this state.假设我想提取 state = 'AL' 的数据,那么下面的条件数据帧 ['AL',] 似乎只返回一行,而这个 state 有多行。

Can someone help me understand the error in this approach.有人可以帮我理解这种方法的错误。

this should work这应该工作

mydataframe[mydataframe$state == "AL",]

or if you want more than one sate或者如果你想要不止一种状态

mydataframe[mydataframe$state %in% c("AL","MD"),]

In R, there are always multiple ways to do something.在 R 中,总是有多种方法可以做某事。 We'll illustrate three different techniques that can be used to subset data in a data frame based on a logical condition.我们将说明三种不同的技术,可用于根据逻辑条件对数据帧中的数据进行子集化。

We'll use data from the 2012 US Hospital Compare Database.我们将使用来自 2012 年美国医院比较数据库的数据。 We'll check to see whether the data has already been downloaded to disk, and if not, download and unzip it.我们将检查数据是否已经下载到磁盘,如果没有,下载并解压缩。

if(!file.exists("outcome-of-care-measures.zip")){
     dlMethod <- "curl"
     if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
     url <- "https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2FProgAssignment3-data.zip"
     download.file(url,destfile='outcome-of-care-measures.zip',method=dlMethod,mode="wb")
     unzip(zipfile = "outcome-of-care-measures.zip")    
}

## read outcome data & keep hospital name, state, and some
## mortality rates. Notice that here we use the extract operator
## to subset columns instead of rows 
theData <- read.csv("outcome-of-care-measures.csv",
                    colClasses = "character")[,c(2,7,11,17,23)]

This first technique matches the one from the other answer, but we illustrate it with both $ and [[ forms of the extract operator during the subset operation.第一种技术与另一个答案中的技术相匹配,但我们在子集操作期间使用提取运算符的$[[ forms 来说明它。

# technique 1: extract operator
aSubset <- theData[theData$State == "AL",]
table(aSubset$State)

AL 
98 

aSubset <- theData[theData[["State"]] == "AL",]
table(aSubset$State)

AL 
98 
> 

Next, we can subset by using a Base R function, such as subset() .接下来,我们可以使用 Base R function 进行子集化,例如subset()

# technique 2: subset() function
aSubset <- subset(theData,State == "AL")
table(aSubset$State)

AL 
98 
>

Finally, for the tidyverse fans, we'll use dplyr::filter() .最后,对于tidyverse的粉丝,我们将使用dplyr::filter()

# technique 3: dplyr::filter()
aSubset <- dplyr::filter(theData,State == "AL")
table(aSubset$State)
AL 
98 
> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM