简体   繁体   English

删除特定列中具有 NA 值的行

[英]remove rows with NA values in a specific column

I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'.我有一个大约 160 万行的庞大数据集,我需要关注的变量(列)是“温度”。 The temperature column has many NA values, and the other variable columns have NA values throughout as well.温度列具有许多 NA 值,其他变量列也始终具有 NA 值。 I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns.我只想删除温度列中具有 NA 值的行,我并不特别关心其他列中的 NA 值。 How can I do this?我怎样才能做到这一点? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns?如果我最终需要删除 NA 值的行而不仅仅是我的温度列(例如深度列),我该如何选择两列? This is my code:这是我的代码:

otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)

na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective. na.omit () 删除所有具有 NA 值的行,不管它们在哪一列,所以我需要一些更有选择性的东西。

You can use the drop_na() function, the first argument is the dataset name, and the second is an optional argument where you can name the specific columns you want to remove the NA responses from.您可以使用 drop_na() 函数,第一个参数是数据集名称,第二个是可选参数,您可以在其中命名要从中删除 NA 响应的特定列。 Like this , drop_na(dataset, column)像这样, drop_na(dataset, column)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM