简体   繁体   中英

How to only multiply certain values in R column?

I am trying to make a column easier to read. Right now, the data looks like this (also this is my first time using stackoverflow.com so ignore any wrong formatting:):

**Money**
16k 
42.3k
15
8900

This is currently being read as character values and not numeric. I want to have all these values standardized, so I want to get rid of the "k". I thought I would do this:

data$money<- data$money %>% 
  str_replace('k', '') %>% 
  as.numeric()

But now my issue is getting the values to show up correctly. So for example, 16k is actually is 16,000 but 42.3k is actually 42,300 and I cannot do:

data$money <- data$money * 1000

because it would be inaccurate since 15 and 8900 should be kept as 15 and 8900, and not 15,000 and 8,900,000. Any tips on what to do? Thanks so much!

A better option which eliminates the need for eval(parse( :

Use gsub to replace the k with the scientific notation symbol for mutiplying by a factor of ten. The normal R number conversion recognizes that and will convert it to a number properly:

x = c('16k','42.3k','15','8900')

as.numeric(gsub('k', 'e3', x))
[1] 16000 42300    15  8900

You can deal with other suffixes by nesting further gsub calls. For example, to also handle M for million:

as.numeric(gsub('k', 'e3', gsub('M', 'e6', x)))
[1] 16000 42300    15  8900

I'm not sure if this is the best option, but it's pretty simple and handles the decimals too. We use a regex to replace k with *1000 (the operation: times 1000), then evaluate the strings to complete the multiplication:

x = c('16k','42.3k','15','8900')

sapply(gsub('k', '*1000', x), function(x) eval(parse(text=x)))

  16*1000 42.3*1000        15      8900 
    16000     42300        15      8900 

You can deal with other suffixes by nesting further gsub calls, for example to convert M to *1000000 before evaluating.

Obviously, be careful using this in production code where a user could insert possibly malicious code into data$money which could be run. But for your purposes, this should be fine.

It all depends on how rigorous you want to be. If you have several symbols (like "k" for 1000 ), you might consider a mapping.

Solution

Start by defining a mapping from each (textual) symbol to its (numeric) conversion scale .

# Define a mapping from symbols like "k" to conversion scales like 1000.
scale_mapping <- data.frame(
  symbol = c("k" ),
  scale  = c(1000)
)

Then simply apply this workflow in the tidyverse :

# Load the 'tidyverse'.
library(tidyverse)

data %>%
  # Split the 'money' column into one column for the number and another for the symbol.
  extract(
    money,
    c("money", NA, "money_symbol"),

    # A rigorous regex: match only a number with an optional decimal, followed by an
    # optional alphabetic symbol; no spaces are permitted (but you can adjust that).
    "(\\d+(\\.\\d+)?)([A-Za-z]+)?"
  ) %>%
  
  # Map any symbols to their conversion scale.
  left_join(
    scale_mapping,
    by = c("money_symbol" = "symbol")
  ) %>%
  
  # Convert to the appropriate scale.
  mutate(
    # First interpret the 'money' text as numbers.
    money = as.numeric(money),

    # Multiply by a scale if available.
    money = if_else(is.na(scale), money, money * scale)
  ) %>%
  
  # Discard the helper columns.
  select(!c(money_symbol, scale))

Results

Given a dataset like your data , reproduced here

data <- structure(
  list(
    money = c("16k", "42.3k", "15", "8900")
  ),
  row.names = c(NA, -4L),
  class = "data.frame"
)

this solution should yield the following results:

# A tibble: 4 x 1
  money
  <dbl>
1 16000
2 42300
3    15
4  8900

Warning

Be sure to use consistent case for your symbols. As it stands, "K" will not match "k" , though they both can represent a scale of 1000 . If you have mixed cases in data , then consider either

  1. standardizing money_symbol with (say) %>% mutate(money_symbol = tolower(money_symbol)) right after extract() ; or
  2. using a fuzzyjoin::regex_left_join (..., ignore_case = TRUE) in place of the left_join (...) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM