Select rows by the specific value

Question

How to select the rows from data frame which has a value over 1 in a specific column ?

That's how my data looks like:

> dput(head(tbl_comp[,-1]))
structure(list(Meve_mean = c(7774.44229552491, 43374.1166119026, 
585562.72426545, 3866.54724117546, 320338.197537275, 918368.01990607
), Mmor_mean = c(39113.5325249635, 119476.157216344, 1296530.34384725, 
23511.2980313616, 209092.538981888, 577355.581852083), Mtot_mean = c(23443.9874102442, 
81425.1369141232, 941046.53405635, 13688.9226362685, 264715.368259581, 
747861.800879077), tot_meanMe = c(258492586.999527, NA, NA, NA, 
NA, NA), tot_meanMm = c(246665241.110832, NA, NA, NA, NA, NA), 
    tot_sdMe = c(35569170.0311164, NA, NA, NA, NA, NA), tot_sdMm = c(30522099.9189256, 
    NA, NA, NA, NA, NA), Wteve_mean = c(10752.4381084666, 53658.8435672746, 
    715547.921685567, 3422.17220367207, 335384.199178456, 1013708.18845339
    ), Wtmor_mean = c(29254.6414790837, 98804.8007431987, 1001344.20496027, 
    11541.8862121394, 217110.411645861, 571826.157099177), Wttot_mean = c(18681.9538387311, 
    73007.110928385, 838032.04308901, 6902.04963587237, 284695.433093058, 
    824330.175015869), tot_meanwte = c(278901499.672313, NA, 
    NA, NA, NA, NA), tot_meanwtm = c(235415566.775308, NA, NA, 
    NA, NA, NA), tot_sdwte = c(16743477.4011497, NA, NA, NA, 
    NA, NA), tot_sdwtm = c(3922418.43271348, NA, NA, NA, NA, 
    NA), diff_eve = c(0.72303994843767, 0.808331185101342, 0.818341730189196, 
    1.12985174650959, 0.955138012828161, 0.905949098928778), 
    diff_mor = c(1.33700262752933, 1.20921408998001, 1.29478988086689, 
    2.03704122525771, 0.963070068343606, 1.00966976533735), diff_tot = c(1.25490018938172, 
    1.11530419268331, 1.12292428650774, 1.98331269093204, 0.929819510568173, 
    0.907235745512628)), .Names = c("Meve_mean", "Mmor_mean", 
"Mtot_mean", "tot_meanMe", "tot_meanMm", "tot_sdMe", "tot_sdMm", 
"Wteve_mean", "Wtmor_mean", "Wttot_mean", "tot_meanwte", "tot_meanwtm", 
"tot_sdwte", "tot_sdwtm", "diff_eve", "diff_mor", "diff_tot"), row.names = c(NA, 
6L), class = "data.frame")

In this data there are three columns which I am interested mostly in:

diff_eve  diff_mor  diff_tot
0.7230399 1.3370026 1.2549002
0.8083312 1.2092141 1.1153042
0.8183417 1.2947899 1.1229243
1.1298517 2.0370412 1.9833127
0.9551380 0.9630701 0.9298195
0.9059491 1.0096698 0.9072357

I would like to create new data frames selected by the value in each of those column. There should be 6 new data frames. Values below 1 in "diff_eve" column should be in new data frame, same goes to values above 1 = next data frame. Of course I would like to keep all of the columns from the data (tbl_comp).

Let me show the example of the new data frame. Selection by the values below 1 in the column diff_eve:

>newdata.frame
diff_eve  diff_mor  diff_tot  .. ....... .....  rest of the columns from tbl_comp.
0.7230399 1.3370026 1.2549002
0.8083312 1.2092141 1.1153042 
0.8183417 1.2947899 1.1229243
0.9551380 0.9630701 0.9298195
0.9059491 1.0096698 0.9072357

I hope some of you understand what I want to achieve.

Answer 1

Here is a solution which gives you six dataframes, in the following order:

diff_eve <1
diff_mor <1
diff_tot <1
diff_eve >1
diff_mor >1
diff_tot >1

Code:

 cols <- c("diff_eve", "diff_mor", "diff_tot")
 c(lapply(cols, function(x)subset(tbl_comp, eval(parse(text=x))<1)), lapply(cols, function(x)subset(tbl_comp, eval(parse(text=x))>1)))

gives you

[[1]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve  diff_mor  diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100   10752.44   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.3370026 1.2549002
2  43374.117  119476.16  81425.14         NA         NA       NA       NA   53658.84   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.2092141 1.1153042
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA  715547.92 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.2947899 1.1229243
5 320338.198  209092.54 264715.37         NA         NA       NA       NA  335384.20  217110.41  284695.43          NA          NA        NA        NA 0.9551380 0.9630701 0.9298195
6 918368.020  577355.58 747861.80         NA         NA       NA       NA 1013708.19  571826.16  824330.18          NA          NA        NA        NA 0.9059491 1.0096698 0.9072357

[[2]]
  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm diff_eve  diff_mor  diff_tot
5  320338.2  209092.5  264715.4         NA         NA       NA       NA   335384.2   217110.4   284695.4          NA          NA        NA        NA 0.955138 0.9630701 0.9298195

[[3]]
  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve  diff_mor  diff_tot
5  320338.2  209092.5  264715.4         NA         NA       NA       NA   335384.2   217110.4   284695.4          NA          NA        NA        NA 0.9551380 0.9630701 0.9298195
6  918368.0  577355.6  747861.8         NA         NA       NA       NA  1013708.2   571826.2   824330.2          NA          NA        NA        NA 0.9059491 1.0096698 0.9072357

[[4]]

  Meve_mean Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm diff_eve diff_mor diff_tot
4  3866.547   23511.3  13688.92         NA         NA       NA       NA   3422.172   11541.89    6902.05          NA          NA        NA        NA 1.129852 2.037041 1.983313

[[5]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm  Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve diff_mor  diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100   10752.438   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.337003 1.2549002
2  43374.117  119476.16  81425.14         NA         NA       NA       NA   53658.844   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.209214 1.1153042
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA  715547.922 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.294790 1.1229243
4   3866.547   23511.30  13688.92         NA         NA       NA       NA    3422.172   11541.89    6902.05          NA          NA        NA        NA 1.1298517 2.037041 1.9833127
6 918368.020  577355.58 747861.80         NA         NA       NA       NA 1013708.188  571826.16  824330.18          NA          NA        NA        NA 0.9059491 1.009670 0.9072357

[[6]]
   Meve_mean  Mmor_mean Mtot_mean tot_meanMe tot_meanMm tot_sdMe tot_sdMm Wteve_mean Wtmor_mean Wttot_mean tot_meanwte tot_meanwtm tot_sdwte tot_sdwtm  diff_eve diff_mor diff_tot
1   7774.442   39113.53  23443.99  258492587  246665241 35569170 30522100  10752.438   29254.64   18681.95   278901500   235415567  16743477   3922418 0.7230399 1.337003 1.254900
2  43374.117  119476.16  81425.14         NA         NA       NA       NA  53658.844   98804.80   73007.11          NA          NA        NA        NA 0.8083312 1.209214 1.115304
3 585562.724 1296530.34 941046.53         NA         NA       NA       NA 715547.922 1001344.20  838032.04          NA          NA        NA        NA 0.8183417 1.294790 1.122924
4   3866.547   23511.30  13688.92         NA         NA       NA       NA   3422.172   11541.89    6902.05          NA          NA        NA        NA 1.1298517 2.037041 1.983313

If you want the columns for the selection criterion in front of your dataframes, as in your example, you can use:

c(lapply(cols, function(x)subset(tbl_comp[,c(cols, setdiff(colnames(tbl_comp), cols))], eval(parse(text=x))<1)), lapply(cols, function(x)subset(tbl_comp[,c(cols, setdiff(colnames(tbl_comp), cols))], eval(parse(text=x))>1)))

Answer 2

I don't know if I correctly understood your problem but here is maybe a solution:

diff_eve_below <- subset(tbl_comp, diff_eve < 1)
diff_eve_above <- subset(tbl_comp, diff_eve > 1)
diff_mor_below <- subset(tbl_comp, diff_mor < 1)
diff_mor_above <- subset(tbl_comp, diff_mor > 1)
diff_tot_below <- subset(tbl_comp, diff_tot < 1)
diff_tot_above <- subset(tbl_comp, diff_tot > 1)

Select rows by the specific value

Question

2 answers

solution1
3 ACCPTED 2013-11-07 15:18:39

solution2
1 2013-11-07 14:46:44

Select rows by the specific value

Question

2 answers

solution1 3 ACCPTED 2013-11-07 15:18:39

solution2 1 2013-11-07 14:46:44

solution1
3 ACCPTED 2013-11-07 15:18:39

solution2
1 2013-11-07 14:46:44