简体   繁体   English

如何用数千和数百万替换“k”和“m”?

[英]How do I replace “k” and “m” with thousands and millions?

I have a dataframe, parsed from Coursera.我有一个从 Coursera 解析的 dataframe。 One of the columns is number of students enrolled on the course.其中一列是参加该课程的学生人数。 Looks like this:看起来像这样:

df <- data.frame(uni = c("Yale", "Toronto", "NYU"), students = c("16m", "240k", "7.5k"))

      uni students
1    Yale     "16m"
2 Toronto     "240k"
3     NYU     "7.5k"

What I need to get is我需要得到的是

      uni students
1    Yale     16000000
2 Toronto     240000
3     NYU     75000

So, the main difficulty for me there is that the class of values is character, and I do not know function for replacing ks and ms, and transforming the class of column to numerics.所以,对我来说主要的困难是值的 class 是字符,我不知道 function 用于替换 ks 和 ms,并将列的 class 转换为数字。

Please, help me!请帮我!

Eg例如

d$students <- dplyr::case_when(
  stringr::str_detect(d$students, 'm') ~ readr::parse_number(d$students) * 1e6,
  stringr::str_detect(d$students, 'k') ~ readr::parse_number(d$students) * 1e3,
  TRUE ~ parse_number(d$students)
)

An option with base r:带有底座 r 的选项:

df$students <- ifelse(grepl('m', ignore.case = TRUE, df$students), as.numeric(gsub("[$m]", "", df$students)) * 10^6,
                      as.numeric(gsub("[$k]", "", df$students)) * 10^3)

# uni students
# 1    Yale 16000000
# 2 Toronto   240000
# 3     NYU     7500

Using stringr and dplyr from tidyverse使用来自tidyversestringrdplyr

library(tidyverse)
df %>%
  mutate(students = case_when(
    str_detect(students, "m") ~ as.numeric(str_extract(students, "[\\d\\.]+")) * 1000000,
    str_detect(students, "k") ~ as.numeric(str_extract(students, "[\\d\\.]+")) * 1000,
  ))
# A tibble: 3 x 2
  uni     students
  <chr>      <dbl>
1 Yale    16000000
2 Toronto   240000
3 NYU         7500

Here's an approach with separate that would work for any arbitrary number of modifiers, simply keep defining them in the case_when statement.这是一种separate方法,适用于任意数量的修饰符,只需在case_when语句中继续定义它们。

library(dplyr)
library(tidry)
df %>%
  separate(students,into = c("value","modifier"),
           sep = "(?<=[\\d])(?=[^\\d.])") %>%
  mutate(modifier = case_when(modifier == "b" ~ 1000000000,
                              modifier == "m" ~ 1000000,
                              modifier == "k" ~ 1000,
                              TRUE ~ 1),
         result = as.numeric(value) * modifier)
      uni value modifier  result
1    Yale    16    1e+06 1.6e+07
2 Toronto   240    1e+03 2.4e+05
3     NYU   7.5    1e+03 7.5e+03

One can write a function that does the conversion, for example:可以编写一个 function 来进行转换,例如:

f <- function(s) {
  l <- nchar(s)
  x <- as.numeric(substr(s, 1, l-1))
  u <- substr(s, l, l)
  x * 10^(3 * match(u, c("k", "M", "G")))
}

f("2M")
f("200k")

Edit: or a little bit more generic:编辑:或更通用一点:

f <- function(s) {
  x <- as.numeric(gsub("[kMG]", "", s))
  u <- gsub("[0-9.]", "", s)
  if (nchar(u))  x <- x * 10^(3 * match(u, c("k", "M", "G")))
  x
}

f("20")
f("2M")
f("200k")

Using gsub and dplyr :使用gsubdplyr

df %>% mutate(
  unit=gsub("[0-9]+\\.*[0-9]*","",students), #selecting unit
  value=as.numeric(gsub("([0-9]+\\.*[0-9]+).", "\\1", students)), 
  students=ifelse(unit=="k",1e3*value,
                  ifelse(unit=="m",1e6*value,
                         ifelse(unit=="b",1e9*value,value)))) %>%
  select(-c(unit,value))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM