简体   繁体   English

as.numeric返回NA,因为列中的某些值没有明显的原因

[英]as.numeric returns NA for no apparent reason for some of the values in a column

While trying to convert a column of characters (strings of numbers, eg "0.1234") into numeric, using as.numeric, some of the values are returned NA with the warning 'NAs introduced by coercion'. 在尝试将一列字符(数字字符串,例如“0.1234”)转换为数字时,使用as.numeric,将某些值返回NA并显示警告“强制引入的NAs”。 The characters that are returned as NA s don't seem to be any different from the ones that are returned as numeric correctly. 返回为NA的字符似乎与正确返回的字符没有任何不同。 Does anyone know what can be the problem? 有谁知道可能是什么问题?

Already tried to look for any characters that are not numeric (as ',') that can hide inside some of the values. 已经尝试查找可以隐藏在某些值中的任何非数字字符(如',')。 I did find strings containing '-' (eg "-0.123") that really turned into NA s, but these are only part of the strings turned into NA s. 我确实找到了包含' - '(例如“-0.123”)的字符串,它们真的变成了NA ,但这些只是变成NA的字符串的一部分。 Also, tried to look for spaces inside the strings. 此外,试图寻找字符串内的空格。 that doesn't seem to be the problem as well. 这似乎也不是问题。

data$y
 [1] "0.833250539"  "0.820323535"  "0.462284612"  "0.792943985"  "0.860587952"  "0.729665177"  "0.461503956"  "0.625871118" 
 [9] "0.740999346"  "0.962727964"  "0.971089266"  "0.869004848"  "0.828651766"  "0.900648732"  "0.970326033"  "0.898123286" 
[17] "0.911640765"  "0.902442126"  "0.843392097"  "0.763421844"  "0.892426243"  "0.380433624"  "0.925017633"  "0.725470821" 
[25] "0.699924767"  "0.689061225"  "0.907462936"  "0.888064239"  "0.913547115"  "-‬0.625103904‭" "0.897385961"  "0.889727462" 
[33] "0.90127339"   "0.947012474"  "0.948883588"  "0.845845512"  "0.97866966"   "0.796247738"  "0.864627056"  "0.266656189‭" 
[41] "0.894915463"  "0.969690678"  "0.771365656‭"  "0.88304436"   "0.954039006"  "0.836952199"  "0.731558669‭"  "0.907224294" 
[49] "0.622059127"  "0.887742343"  "0.917550343"  "0.97240334‭"   "0.902841957"  "0.617403052"  "0.82926708"   "0.674903846" 
[57] "0.947132958"  "0.929213613‭"  "-‬0.297844476" "0.871767367"

y = as.numeric(data$y)

Warning message: NAs introduced by coercion 警告信息:强制引入的NA

y
 [1] 0.8332505 0.8203235 0.4622846 0.7929440 0.8605880 0.7296652 0.4615040 0.6258711 0.7409993 0.9627280 0.9710893 0.8690048 0.8286518
[14] 0.9006487 0.9703260 0.8981233 0.9116408 0.9024421 0.8433921 0.7634218 0.8924262 0.3804336 0.9250176 0.7254708 0.6999248 0.6890612
[27] 0.9074629 0.8880642 0.9135471        NA 0.8973860 0.8897275 0.9012734 0.9470125 0.9488836 0.8458455 0.9786697 0.7962477 0.8646271
[40]        NA 0.8949155 0.9696907        NA 0.8830444 0.9540390 0.8369522        NA 0.9072243 0.6220591 0.8877423 0.9175503        NA
[53] 0.9028420 0.6174031 0.8292671 0.6749038 0.9471330        NA        NA 0.8717674

Your strings contain some non-unicode characters. 您的字符串包含一些非unicode字符。 If you are certain that it is safe to remove them, use 如果您确定可以安全地将其删除,请使用

as.numeric(iconv(data$y, 'utf-8', 'ascii', sub=''))

Ref on the conversion 参考转换

Copy and pasting your character gives me (for the example of the last NA ) "-,0.297844476" . 复制和粘贴你的角色给了我(对于最后一个NA的例子) "-,0.297844476" There is something wrong with the encoding. 编码有问题。 You can work around by using 你可以通过使用来解决

as.numeric(gsub(",","",data$y))

edit This answer does not work on all your NA s... I don't really know what is going on with your data, please provide a dput if possible. 编辑这个答案不适用于你所有的NA ...我真的不知道你的数据是怎么回事,请尽可能提供一个dput

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM