[英]Extract values for a particular variable from lists of nested list and append a column of the extracted values to my original dataset
[英]Removing N/A values from particular column in dataset
抱歉,如果这是重复的,我已经搜索了高低,找不到适合我的东西。 我对 R 也很陌生,所以希望我能正确解释自己。
我有一个关于视频游戏数据的数据集,看起来像这样
Game Platform Year of Release Genre
Wii Sports Wii 2006 Sports
Wii Sports Wii N/A Sports
Wii Sports Wii 2006 Sports
Wii Sports Wii N/A Sports
Wii Sports Wii 2006 Sports
Wii Sports Wii 2006 Sports
Wii Sports Wii N/A Sports
Wii Sports Wii 2006 Sports
我正在尝试从“发布年份”列中删除具有 N/A 值的行。 有一段时间我在尝试诸如 na.omit() 之类的东西,但这没有用。 然后我意识到 N/A 被烘焙到 original.csv 中,所以这可以使 R 将它们视为字符串。 然后我尝试了 grepl function ,这也没有用。 有人有什么想法吗?
如果您需要任何进一步的信息,请告诉我。
谢谢!
编辑:实际数据集可在此处免费获得,我可能最初应该链接此: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings?select=Video_Games_Sales_as_at_22_Dec_2016.Z628CB5675FF524F3F3E719B7AA2E8
如果有人能够从该数据集的 Year_of_Release 列中删除具有 N/A 值的行并解释他们是如何做到的,我将非常感激,再次感谢!
以下面创建的玩具数据集为例,我将向您展示如何在逻辑条件下将数据子集化为特定的行/观察。
df <- data.frame(name=c("a","b","c"),
year = c(2014, "N/A", 2015))
我们可以使用不等于逻辑运算符来测试变量year
的每个元素是否不是N/A
!=
df$year != "N/A"
该向量可用于确定要返回的数据行。 该向量在方括号内表示为i
of df[i,j]
,其中i
确定 data.frame df
中观察的顺序或子集, j
是要返回的变量/列(空白返回所有变量)。
df[df$year != "N/A", ]
请注意,如果您的变量名称(发布年份)中有空格,则需要将其包装在反引号( ```
)中,如果您有实际的NA
值,请使用is.na()
function 和!
运算符来反转逻辑。
如果您使用提供方法drop_na
的tidyverse
,这是一个优雅的解决方案,该方法非常易于阅读。 我刚刚将您的变量更改为Year_of_Release
library(tidyverse)
data= data.frame("Game" = c("Wii Sports ", "Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports ","Wii Sports "),
"Platform" =c("Wii","Wii","Wii","Wii","Wii","Wii","Wii", "Wii"),
"Year_of_Release" = c(2006, NA, 2006, NA, 2006, 2006, NA, 2006),
"Genre"=c("Sports","Sports","Sports","Sports","Sports","Sports","Sports","Sports"))
data %>% drop_na("Year_of_Release")
Game Platform Year_of_Release Genre
1 Wii Sports Wii 2006 Sports
2 Wii Sports Wii 2006 Sports
3 Wii Sports Wii 2006 Sports
4 Wii Sports Wii 2006 Sports
5 Wii Sports Wii 2006 Sports
或者
data %>% drop_na(Year_of_Release) # if Year_of_Release is the column name
重要在 R 中, N/A
不被视为Lukasz.
建议的NA
。 在尝试此操作之前,应将其转换为NA
。
使用read.table/read.csv
读取数据时,使用na.strings
指定将是NA.
data<-read.csv('file.csv', na.strings = "N/A")
您可以让R
像实际NA
一样对待"N/A"
:
mydata<-read.csv('file.csv', na.strings = c("N/A"))
然后像往常一样使用na.omit()
。
我会先用na_if
用实际的 NA 替换这个“NA 字符串”,然后用.is.na(variable)
过滤
df %>% na_if("N/A") %>% filter(!is.na(Year_of_Release))
Game Platform Year_of_Release Genre
1 Wii Sports Wii 2006 Sports
2 Wii Sports Wii 2006 Sports
3 Wii Sports Wii 2006 Sports
4 Wii Sports Wii 2006 Sports
5 Wii Sports Wii 2006 Sports
数据
structure(list(Game = c("Wii Sports ", "Wii Sports ", "Wii Sports ",
"Wii Sports ", "Wii Sports ", "Wii Sports ", "Wii Sports ", "Wii Sports "
), Platform = c("Wii", "Wii", "Wii", "Wii", "Wii", "Wii", "Wii",
"Wii"), Year_of_Release = c("2006", "N/A", "2006", "N/A", "2006",
"2006", "N/A", "2006"), Genre = c("Sports", "Sports", "Sports",
"Sports", "Sports", "Sports", "Sports", "Sports")), class = "data.frame", row.names = c(NA,
-8L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.