简体   繁体   English

将货币字符串值重新编码为数字的新变量

[英]Recode monetary string values into new variable as numeric

First off - newbie with R so bear with me.首先 - R 新手,所以请耐心等待。 I'm trying to recode string values as numeric.我正在尝试将字符串值重新编码为数字。 My problem is I have two different string patterns present in my values: "M" and "B" for 'million' and 'billion', respectively.我的问题是我的值中有两种不同的字符串模式:“M”和“B”分别代表“百万”和“十亿”。

df <- (funds = c($1.76M, $2B, $57M, $9.87B)

I've successfully knocked off the dollar sign and now have:我已经成功敲掉了美元符号,现在有:

df <- (funds = c($1.76M, $2B, $57M, $9.87B),
       fundsR = c(1.76M, 2B, 57M, 9.87B)
       )

How can I recode these as numeric while retaining their respective monetary values?如何在保留它们各自的货币价值的同时将它们重新编码为数字? I've tried using various if statements, for loops, with or without str_detect, pipe operators, case_when, mutate, etc. to isolate values with "M" and values with "B", convert to numeric and multiply to come up the complimentary numeric value--all in a new column.我尝试使用各种 if 语句、for 循环、带或不带 str_detect、管道运算符、case_when、mutate 等来隔离带有“M”的值和带有“B”的值,转换为数字并相乘以得出互补数值——全部在一个新列中。 This seemingly simple task turned out not as simple as I imagined it would be and I'd attribute it to being a novice.这个看似简单的任务结果并不像我想象的那么简单,我将其归因于新手。 At this point I'd like to start from scratch and see if anyone has any fresh ideas.在这一点上,我想从头开始,看看是否有人有任何新的想法。 My Rstudio is a MESS.我的 Rstudio 是一团糟。

Something like this would be nice:像这样的东西会很好:

df <- (funds = c($1.76M, $2B, $57M, $9.87B),
       fundsR = c(1.76M, 2B, 57M, 9.87B),
       fundsFinal = c(1760000, 2000000000, 57000000, 9870000000)
       )

I'd really appreciate your input.我非常感谢您的意见。

You could create a helper function f , and then apply it to the funds column:您可以创建一个辅助函数f ,然后将其应用于funds列:


library(dplyr)
library(stringr)

f <- function(x) {
  curr = c("M"=1e6, "B" = 1e9)
  val = str_remove(x,"\\$")
  as.numeric(str_remove_all(val,"B|M"))*curr[str_extract(val, "B|M")]
}

df %>% mutate(fundsFinal = f(funds))

Output:输出:

   funds fundsFinal
1 $1.76M   1.76e+06
2    $2B   2.00e+09
3   $57M   5.70e+07
4 $9.87B   9.87e+09

Input:输入:

df = structure(list(funds = c("$1.76M", "$2B", "$57M", "$9.87B")), class = "data.frame", row.names = c(NA, 
-4L))

This works but I'm sure better solutions exist.这可行,但我确信存在更好的解决方案。 Assuming funds is a character vector:假设funds是一个特征向量:

library(tidyverse)
options(scipen = 999)
df <- data.frame(funds = c('$1.76M', '$2B', '$57M', '$9.87B'))


df = df %>%
  mutate( fundsFinal = ifelse(str_sub(funds,nchar(funds),-1) =='M',
                          as.numeric(substr(funds, 2, nchar(funds) - 1))*10^6,
                          as.numeric(substr(funds, 2, nchar(funds) - 1))*10^9))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM