繁体   English   中英

从数据集中的特定列中删除 N/A 值

[英]Removing N/A values from particular column in dataset

抱歉,如果这是重复的,我已经搜索了高低,找不到适合我的东西。 我对 R 也很陌生,所以希望我能正确解释自己。

我有一个关于视频游戏数据的数据集,看起来像这样

Game          Platform    Year of Release   Genre

Wii Sports    Wii          2006             Sports
Wii Sports    Wii          N/A              Sports
Wii Sports    Wii          2006             Sports
Wii Sports    Wii          N/A              Sports
Wii Sports    Wii          2006             Sports
Wii Sports    Wii          2006             Sports
Wii Sports    Wii          N/A              Sports
Wii Sports    Wii          2006             Sports

我正在尝试从“发布年份”列中删除具有 N/A 值的行。 有一段时间我在尝试诸如 na.omit() 之类的东西,但这没有用。 然后我意识到 N/A 被烘焙到 original.csv 中,所以这可以使 R 将它们视为字符串。 然后我尝试了 grepl function ,这也没有用。 有人有什么想法吗?

如果您需要任何进一步的信息,请告诉我。

谢谢!

编辑:实际数据集可在此处免费获得,我可能最初应该链接此: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings?select=Video_Games_Sales_as_at_22_Dec_2016.Z628CB5675FF524F3F3E719B7AA2E8

如果有人能够从该数据集的 Year_of_Release 列中删除具有 N/A 值的行并解释他们是如何做到的,我将非常感激,再次感谢!

以下面创建的玩具数据集为例,我将向您展示如何在逻辑条件下将数据子集化为特定的行/观察。

df <- data.frame(name=c("a","b","c"), 
             year = c(2014, "N/A", 2015))

我们可以使用不等于逻辑运算符来测试变量year的每个元素是否不是N/A !=

df$year != "N/A"

该向量可用于确定要返回的数据行。 该向量在方括号内表示为i of df[i,j] ,其中i确定 data.frame df中观察的顺序或子集, j是要返回的变量/列(空白返回所有变量)。

df[df$year != "N/A", ]

请注意,如果您的变量名称(发布年份)中有空格,则需要将其包装在反引号( ``` )中,如果您有实际的NA值,请使用is.na() function 和! 运算符来反转逻辑。

如果您使用提供方法drop_natidyverse ,这是一个优雅的解决方案,该方法非常易于阅读。 我刚刚将您的变量更改为Year_of_Release

library(tidyverse)

data= data.frame("Game" = c("Wii Sports ", "Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports "),
                  "Platform" =c("Wii","Wii","Wii","Wii","Wii","Wii","Wii", "Wii"),
                  "Year_of_Release" = c(2006, NA, 2006, NA, 2006, 2006, NA, 2006),
                  "Genre"=c("Sports","Sports","Sports","Sports","Sports","Sports","Sports","Sports"))

data %>% drop_na("Year_of_Release")

           Game Platform Year_of_Release  Genre
1 Wii Sports       Wii            2006 Sports
2 Wii Sports       Wii            2006 Sports
3 Wii Sports       Wii            2006 Sports
4 Wii Sports       Wii            2006 Sports
5 Wii Sports       Wii            2006 Sports 

或者

data %>% drop_na(Year_of_Release) # if Year_of_Release is the column name

重要在 R 中, N/A不被视为Lukasz.建议的NA 在尝试此操作之前,应将其转换为NA

使用read.table/read.csv读取数据时,使用na.strings指定将是NA.

 data<-read.csv('file.csv', na.strings = "N/A")

您可以让R像实际NA一样对待"N/A"

mydata<-read.csv('file.csv', na.strings = c("N/A"))

然后像往常一样使用na.omit()

我会先用na_if用实际的 NA 替换这个“NA 字符串”,然后用.is.na(variable)过滤

df %>% na_if("N/A") %>% filter(!is.na(Year_of_Release))

         Game Platform Year_of_Release  Genre
1 Wii Sports       Wii            2006 Sports
2 Wii Sports       Wii            2006 Sports
3 Wii Sports       Wii            2006 Sports
4 Wii Sports       Wii            2006 Sports
5 Wii Sports       Wii            2006 Sports

数据

structure(list(Game = c("Wii Sports ", "Wii Sports ", "Wii Sports ", 
"Wii Sports ", "Wii Sports ", "Wii Sports ", "Wii Sports ", "Wii Sports "
), Platform = c("Wii", "Wii", "Wii", "Wii", "Wii", "Wii", "Wii", 
"Wii"), Year_of_Release = c("2006", "N/A", "2006", "N/A", "2006", 
"2006", "N/A", "2006"), Genre = c("Sports", "Sports", "Sports", 
"Sports", "Sports", "Sports", "Sports", "Sports")), class = "data.frame", row.names = c(NA, 
-8L))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM