简体   繁体   中英

select rows that vary in only one column - R

I have a dataframe that looks somewhat like the following:

   v1 v2 v3 v4 v5 v6
r1 1  2  2  4  5  9
r2 1  2  2  4  5  10
r3 2  2  2  4  5  9
r4 2  2  2  4  5  10

I would like to select rows r1 and r3 based on the fact that they differ in their values in v1. The numbers in that column range from 1 to 100. Is that possible and if yes, how?

Any help greatly appreciated, thank you!

Update:

Some clarification: The values in v1 are random seeds that go from 1 to 100. Basically, I run (in NetLogo) a simulation that runs through all parameter configurations (v2 - v6) with 100 different random seeds. I would now like to select all rows that belong to the same parameter configuration, ie if v2 = 2 and v3= 5, get me all rows that meet that condition and have different values in v1/random seeds. But since I have quite a lot of parameter configurations, I would like to do this generically, so that I don't have to write these manually. Hence the question if it is possible to select rows that are the same in a number of columns, but differ in one specific column.

Here is one way using plyr to split the data.frame up into chunks. Each chunk consists of those that have the same values in the first column. We simply return the first row from each chunk. Like this:

#  Here function(x) x[1,] - mean return the first row from each piece
ddply( df , .variables = "v1" , .fun = function(x) x[1,] )
#  v1 v2 v3 v4 v5 v6
#1  1  2  2  4  5  9
#2  2  2  2  4  5  9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM