select() 選擇的列比我告訴它的要多。為什么？ - R

Question

例如，當我使用dplyr的select()時：

mtcars %>% select(., cyl, disp)

它正確選擇cyl和disp 。 但是當我在我正在研究的 dataframe 中執行此操作時，（假設它是iris ）：

iris %>% select(., Sepal.Length, Sepal.Width)

即使我沒有告訴 select Petal.Length ，它也會選擇Sepal.Length 、 Sepal.Width和Petal.Length 。 這非常令人沮喪，因為我在文檔、stackoverflow 或谷歌中找不到任何解釋。

最后，我想知道select()什么時候會選擇我沒有告訴它選擇的列？ 有什么建議么？

編輯 - 數據：

structure(list(codigo_estacion = 11545000L, institucion = "DGA", 
    fuente = "dga_web", nombre = "Rio Baker Bajo Ã‘Adis", altura = 45L, 
    latitud = -47.5, longitud = -72.9749984741211, codigo_cuenca = 115L, 
    nombre_sub_cuenca = "Rio Baker Entre Arriba Rio De La Colonia Y Desemb.", 
    cantidad_observaciones = 4736L, fecha = structure(15624, class = "Date"), 
    caudal = 692, gauge_id = 11545000L, gauge_name = "Rio Baker Bajo Ã‘Adis", 
    precip_promedio = 0.454545468091965, temp_max_promedio = 17.0166664123535, 
    estacion_ano = "Primavera", caudal_extremo = 0, temp_extremo = 0, 
    precip_extremo = 0), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -1L), groups = structure(list(
    codigo_estacion = 11545000L, estacion_ano = "Primavera", 
    .rows = list(1L)), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

我正在使用的代碼：

df %>% dplyr::select(codigo_estacion, caudal_extremo)

但它給出了列estacion_ano 、 codigo_estacion和caudal_extremo 。

Answer 1

您提供的數據是由變量estacion_ano分組的數據框。 在分組數據框上使用select時，分組變量將自動添加到結果中。 您可能想在使用select之前ungroup ：

df %>% 
  dplyr::ungroup() %>% 
  dplyr::select(codigo_estacion, caudal_extremo)

# A tibble: 1 x 2
# codigo_estacion caudal_extremo
#           <int>          <dbl>
# 1      11545000              0

select() 選擇的列比我告訴它的要多。為什么？ - R

問題描述

1 個解決方案

解決方案1
0 2019-10-17 22:43:58

select() 選擇的列比我告訴它的要多。 為什么？ - R

問題描述

1 個解決方案

解決方案1 0 2019-10-17 22:43:58

select() 選擇的列比我告訴它的要多。為什么？ - R

解決方案1
0 2019-10-17 22:43:58