[英]How do you convert all numeric strings in a dataframe to numeric in R?
[英]How do I test for numeric values in a dataframe of characters, and convert those to numeric?
我有一個類似於以下數據框:
> theDF
ID Ticker INDUSTRY_SECTOR VAR CVAR
1 1 USD CASH 0 0
12 2 ZAR CASH -181412.82055904 -301731.22832191
23 3 BAT SJ EQUITY Financial 61711.951234826 102641.162795691
34 4 HCI SJ EQUITY Financial 1095.16002541256 1821.50290513369
45 5 PSG SJ EQUITY Financial 16498.2192382422 27440.331617902
我們可以看到這些都是字符列:
> apply(theDF, 2, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
我想要的東西只會改變數字類型向量到數字。 基本上,如果它“看起來像”一個數字,請將其設為數字,否則保留它。 我在StackOverflow上找不到任何不需要知道你想要轉換的名稱或列的東西。 這個DF並不總是以相同的順序,或者有列,所以我需要一些動態的方法來檢查列是否“看起來像”數字並使這些列數字化。
這(顯然)給了我一堆NA; s為字符列:
> apply(theDF, 2, as.numeric)
ID Ticker INDUSTRY_SECTOR VAR CVAR
[1,] 1 NA NA 0.00 0.000
[2,] 2 NA NA -181412.82 -301731.228
[3,] 3 NA NA 61711.95 102641.163
[4,] 4 NA NA 1095.16 1821.503
[5,] 5 NA NA 16498.22 27440.332
我試過這樣的事情,但它不僅不起作用,而且看起來非常低效:
> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
ID Ticker INDUSTRY_SECTOR VAR CVAR
[1,] "1" "USD CASH" "" "0" "0"
[2,] "2" "ZAR CASH" "" "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial" "61711.951234826" "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial" "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial" "16498.2192382422" "27440.331617902"
有一個更好的方法嗎?
編輯:人們不斷要求這個,所以這里......
> apply(theDF, 2, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> sapply(theDF, mode)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> apply(theDF, 2, class)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
> sapply(theDF, class)
ID Ticker INDUSTRY_SECTOR VAR CVAR
"character" "character" "character" "character" "character"
看起來像是type.convert()
。
theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
# ID Ticker INDUSTRY_SECTOR VAR CVAR
# "integer" "character" "character" "numeric" "numeric"
type.convert()
將向量type.convert()
轉換為“最合適”的類型。 設置as.is = TRUE
允許我們保留字符,否則它們將被強制轉換為因子。
更新:對於非字符的列,需要先將其強制轉換為字符。
theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.