[英]remove rows with NA values in a specific column
I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'.我有一个大约 160 万行的庞大数据集,我需要关注的变量(列)是“温度”。 The temperature column has many NA values, and the other variable columns have NA values throughout as well.
温度列具有许多 NA 值,其他变量列也始终具有 NA 值。 I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns.
我只想删除温度列中具有 NA 值的行,我并不特别关心其他列中的 NA 值。 How can I do this?
我怎样才能做到这一点? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns?
如果我最终需要删除 NA 值的行而不仅仅是我的温度列(例如深度列),我该如何选择两列? This is my code:
这是我的代码:
otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)
na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective. na.omit () 删除所有具有 NA 值的行,不管它们在哪一列,所以我需要一些更有选择性的东西。
You can use the drop_na() function, the first argument is the dataset name, and the second is an optional argument where you can name the specific columns you want to remove the NA responses from.您可以使用 drop_na() 函数,第一个参数是数据集名称,第二个是可选参数,您可以在其中命名要从中删除 NA 响应的特定列。 Like this , drop_na(dataset, column)
像这样, drop_na(dataset, column)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.