[英]How do I replace “k” and “m” with thousands and millions?
I have a dataframe, parsed from Coursera.我有一个从 Coursera 解析的 dataframe。 One of the columns is number of students enrolled on the course.
其中一列是参加该课程的学生人数。 Looks like this:
看起来像这样:
df <- data.frame(uni = c("Yale", "Toronto", "NYU"), students = c("16m", "240k", "7.5k"))
uni students
1 Yale "16m"
2 Toronto "240k"
3 NYU "7.5k"
What I need to get is我需要得到的是
uni students
1 Yale 16000000
2 Toronto 240000
3 NYU 75000
So, the main difficulty for me there is that the class of values is character, and I do not know function for replacing ks and ms, and transforming the class of column to numerics.所以,对我来说主要的困难是值的 class 是字符,我不知道 function 用于替换 ks 和 ms,并将列的 class 转换为数字。
Please, help me!请帮我!
Eg例如
d$students <- dplyr::case_when(
stringr::str_detect(d$students, 'm') ~ readr::parse_number(d$students) * 1e6,
stringr::str_detect(d$students, 'k') ~ readr::parse_number(d$students) * 1e3,
TRUE ~ parse_number(d$students)
)
An option with base r:带有底座 r 的选项:
df$students <- ifelse(grepl('m', ignore.case = TRUE, df$students), as.numeric(gsub("[$m]", "", df$students)) * 10^6,
as.numeric(gsub("[$k]", "", df$students)) * 10^3)
# uni students
# 1 Yale 16000000
# 2 Toronto 240000
# 3 NYU 7500
Using stringr
and dplyr
from tidyverse
使用来自
tidyverse
的stringr
和dplyr
library(tidyverse)
df %>%
mutate(students = case_when(
str_detect(students, "m") ~ as.numeric(str_extract(students, "[\\d\\.]+")) * 1000000,
str_detect(students, "k") ~ as.numeric(str_extract(students, "[\\d\\.]+")) * 1000,
))
# A tibble: 3 x 2
uni students
<chr> <dbl>
1 Yale 16000000
2 Toronto 240000
3 NYU 7500
Here's an approach with separate
that would work for any arbitrary number of modifiers, simply keep defining them in the case_when
statement.这是一种
separate
方法,适用于任意数量的修饰符,只需在case_when
语句中继续定义它们。
library(dplyr)
library(tidry)
df %>%
separate(students,into = c("value","modifier"),
sep = "(?<=[\\d])(?=[^\\d.])") %>%
mutate(modifier = case_when(modifier == "b" ~ 1000000000,
modifier == "m" ~ 1000000,
modifier == "k" ~ 1000,
TRUE ~ 1),
result = as.numeric(value) * modifier)
uni value modifier result
1 Yale 16 1e+06 1.6e+07
2 Toronto 240 1e+03 2.4e+05
3 NYU 7.5 1e+03 7.5e+03
One can write a function that does the conversion, for example:可以编写一个 function 来进行转换,例如:
f <- function(s) {
l <- nchar(s)
x <- as.numeric(substr(s, 1, l-1))
u <- substr(s, l, l)
x * 10^(3 * match(u, c("k", "M", "G")))
}
f("2M")
f("200k")
Edit: or a little bit more generic:编辑:或更通用一点:
f <- function(s) {
x <- as.numeric(gsub("[kMG]", "", s))
u <- gsub("[0-9.]", "", s)
if (nchar(u)) x <- x * 10^(3 * match(u, c("k", "M", "G")))
x
}
f("20")
f("2M")
f("200k")
Using gsub
and dplyr
:使用
gsub
和dplyr
:
df %>% mutate(
unit=gsub("[0-9]+\\.*[0-9]*","",students), #selecting unit
value=as.numeric(gsub("([0-9]+\\.*[0-9]+).", "\\1", students)),
students=ifelse(unit=="k",1e3*value,
ifelse(unit=="m",1e6*value,
ifelse(unit=="b",1e9*value,value)))) %>%
select(-c(unit,value))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.