将字符转换为R中的数字值

Question

I have a file that I read in into R and is translated to a dataframe (called CA1) to have the structure as followed: 我有一个文件，我读入R并被转换为数据框（称为CA1），具有如下结构：

   Station_ID Guage_Type   Lat   Long     Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
 1    4457700         HI 41.52 124.03 19480701         8        LST  0  0  0  0  0  0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
 2    4457700         HI 41.52 124.03 19480705         8        LST  0  1  1  1  1  1   2   2   2   4   5   5   4   7   1   1   0   0  10  13   5   1   1   3
 3    4457700         HI 41.52 124.03 19480706         8        LST  1  1  1  0  1  1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 4    4457700         HI 41.52 124.03 19480727         8        LST  3  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
 5    4457700         HI 41.52 124.03 19480801         8        LST  0  0  0  0  0  0   0   0   0   0   0   0 MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS MIS
 6    4457700         HI 41.52 124.03 19480817         8        LST  0  0  0  0  0  0 ACC ACC ACC ACC ACC ACC   6   1   0   0   0   0   0   0   0   0   0   0

H0 through H23 are read in as character() since there will be cases when the value will not be numeric and will have values such as MIS, ACC, or DEL. H0到H23作为字符（）读入，因为有些情况下，该值不是数字，并且将具有诸如MIS，ACC或DEL之类的值。

My question: is there a way to typecast the values for each column H0 through H23 to be numeric and have the character values (MIS, ACC, DEL) as NA or NAN which I can test for it if they are (is.nan or is.na) so I can run some numeric models on it. 我的问题：有没有办法将每列H0到H23的值分类为数字，并且字符值（MIS，ACC，DEL）为NA或NAN，如果它们是（is.nan或is.na）所以我可以运行一些数字模型。 Or would it be best to have the character values to be changed to an identifier, such as -9999? 或者最好将字符值更改为标识符，例如-9999？

I have tried many ways. 我尝试了很多方法。 I have found a few on this site but none of work. 我在这个网站上找到了一些但没有工作。 Such as: 如：

 for (i in 8:31)
 {
     CA1[6,i] <- as.numeric(as.character(CA1[6,i]))
 }

which of course gives warnings but as I test if two specific values is_numeric() (CA1[6,8] and CA1[6,19]) I get a false statement for both. 这当然会发出警告，但是当我测试两个特定值is_numeric（）（CA1 [6,8]和CA1 [6,19]）时，我得到两个错误的声明。 The first I don't understand why, but the second I do since it is a "". 第一个我不明白为什么，但第二个我做，因为它是一个“”。 However, I can test that with is.na(CA1[6,19]) and returns true, which is just fine for me to know it is not numeric. 但是，我可以使用is.na（CA1 [6,19]）测试它并返回true，这对我来说很好，因为我知道它不是数字。

A second way I tried is: 我尝试的第二种方式是：

 for (i in 8:31)
 {
     CA1[6,i] <- as.numeric(levels(CA1[6,i]))[CA1[6,i]]
 }

which I get the same results as before. 我得到了与以前相同的结果。

Is there a way of doing what I am trying to do in an efficient manner? 有没有办法以有效的方式做我想做的事情？ Your help is greatly appreciated. 非常感谢您的帮助。 Thank you 谢谢

Answer 1

When you read in the data, you can typically specify what the column types are. 读入数据时，通常可以指定列类型。 For example, read.table / read.csv have a colClasses argument. 例如， read.table / read.csv有一个colClasses参数。

# Something like this
read.table('foo.txt', header=TRUE, colClasses=c('integer', 'factor', 'numeric', numeric', 'Date'))

See ?read.table for more information. 有关更多信息，请参阅?read.table 。

Answer 2

The immediate problem is each column of a data frame can only contain values of one type. 当前的问题是数据框的每一列只能包含一种类型的值。 The 6 in CA1[6,i] in your code means that only a single value is being converted in each column, so, when it is inserted after conversion, it has to be coerced back to a string to match the rest of the column. 在6在CA1[6,i]在代码意味着只有一个单一的值被转换在每列中，所以，当它被转换之后插入时，它必须被强制回字符串到柱的其余部分相匹配。

You can solve this by converting the whole column in one go, so that the column is entirely replaced. 您可以通过一次转换整个列来解决此问题，以便完全替换该列。 ie remove the 6 : 即删除6 ：

 for (i in 8:31)
 {
     CA1[,i] <- as.numeric(as.character(CA1[,i]))
 }

Answer 3

Following on Tommy's answer, you potentially could deal with this issue when reading in the data. 根据Tommy的回答，您在阅读数据时可能会处理此问题。 If "MIS" , "ACC" and "DEL" always denote missing values, you could use the na.strings argument in read.table . 如果"MIS" ， "ACC"和"DEL"始终表示缺少值，则可以在read.table使用na.strings参数。

read.table('foo.txt', header=TRUE, na.strings = c("MIS", "ACC", "DEL"))

If there are other character strings that always denote missing values, then you could add them to the above vector. 如果有其他字符串始终表示缺失值，则可以将它们添加到上面的向量中。

However, if, for example, "MIS" appears in the column Time_Frame and it has a meaning other than to denote a missing value, then DO NOT TAKE THIS APPROACH!! 但是，例如，如果"MIS"出现在Time_Frame列中，并且它具有除表示缺失值之外的含义，则不要采取这种方法！

将字符转换为R中的数字值

问题描述

3 个解决方案

解决方案1
6 2012-05-04 09:13:26

解决方案2
6 已采纳 2012-05-04 09:18:11

解决方案3
2 2012-05-04 10:27:01

将字符转换为R中的数字值

问题描述

3 个解决方案

解决方案1 6 2012-05-04 09:13:26

解决方案2 6 已采纳 2012-05-04 09:18:11

解决方案3 2 2012-05-04 10:27:01

解决方案1
6 2012-05-04 09:13:26

解决方案2
6 已采纳 2012-05-04 09:18:11

解决方案3
2 2012-05-04 10:27:01