简体   繁体   English

如何从字符向量的数据帧列的所有值中删除$?

[英]How to remove $ from all values in a data frame column in a character vector?

I have a data frame in R that has information about NBA players, including salary information. 我在R中有一个数据框,其中包含有关NBA球员的信息,包括薪水信息。 All the data in the salary column have a "$" before the value and I want to convert the character data to numeric for the purpose of analysis. 薪水列中的所有数据在该值之前都有一个“ $”,我想将字符数据转换为数字以便进行分析。 So I need to remove the "$" in this column. 因此,我需要在此列中删除“ $”。 However, I am unable to subset or parse any of the values in this column. 但是,我无法对本列中的任何值进行子集或解析。 It seems that each value is a vector of 1. I've included below the structure of the data and what I have tried in my attempt at removing the "$". 似乎每个值都是1的向量。我已经在数据结构下面以及尝试删除“ $”时尝试的内容包括在内。

> str(combined)

'data.frame':   588 obs. of  9 variables:
$ Player: chr  "Aaron Brooks" "Aaron Gordon" "Aaron Gray" "Aaron Harrison" ...
$ Tm    : Factor w/ 30 levels "ATL","BOS","BRK",..: 4 22 9 5 9 18 1 5 25 30 ...
$ Pos   : Factor w/ 5 levels "C","PF","PG",..: 3 2 NA 5 NA 2 1 1 4 5 ...
$ Age   : num  31 20 NA 21 NA 24 29 31 25 33 ...
$ G     : num  69 78 NA 21 NA 52 82 47 82 13 ...
$ MP    : num  1108 1863 NA 93 NA ...
$ PER   : num  11.8 17 NA 4.3 NA 5.6 19.4 18.2 12.7 9.2 ...
$ WS    : num  0.9 5.4 NA 0 NA -0.5 9.4 2.8 4 0.3 ...
$ Salary: chr  "$2000000" "$4171680" "$452059" "$525093" ...

combined[, "Salary"] <- gsub("$", "", combined[, "Salary"])

The last line of code above is able to run successfully but it doesn't augment the "Salary" column. 上面的最后一行代码可以成功运行,但不会增加“薪水”列。

I am able to successfully augment it by running the code listed below, but I need to find a way to automize the replacement process for the whole data set instead of doing it row by row. 我可以通过运行下面列出的代码来成功地对其进行扩充,但是我需要找到一种方法来自动执行整个数据集的替换过程,而不是逐行进行。

combined[, "Salary"] <- gsub("$2000000", "2000000", combined[, "Salary"])

How can I subset the character vectors in this column to remove the "$"? 如何在此列中子集字符向量以删除“ $”? Apologies for any formatting faux pas ahead of time, this is my first time asking a question. 对于任何格式的伪造提前道歉,这是我第一次问一个问题。 Cheers, 干杯,

The $ is a metacharacter which means the end of the string. $是一个元字符,表示字符串的结尾。 So, we need to either escape ( \\\\$ ) or place it in square brackets ( "[$]" ) or use fixed = TRUE in the sub . 因此,我们需要转义( \\\\$ )或将其放在方括号( "[$]" )中,或在sub使用fixed = TRUE We don't need gsub as there seems to be only a single $ character in each string. 我们不需要gsub因为每个字符串中似乎只有一个$字符。

 combined[, "Salary"] <- as.numeric(sub("$", "", combined[, "Salary"], fixed=TRUE))

Or as @gung mentioned in the comments, using substr would be faster 或如评论中提到的@gung,使用substr会更快

as.numeric(substr(d$Salary, 2, nchar(d$Salary)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM