简体   繁体   English

R从数据框中的列中提取前两个字符

[英]R Extract first two characters from a column in a dataframe

I have a dataset with multiple and I would like to extract the first two characters from the sr column.我有一个包含多个的数据集,我想从sr列中提取前两个characters Lastly, these characters will be stored in a new column.最后,这些字符将存储在一个新列中。

Basically, I want to have a new column permit_type that has the first two character values from sr ie AP , SP and MP .基本上,我想要一个新列permit_type ,其中包含来自sr的前两个字符值,即APSPMP

How can I do this?我怎样才能做到这一点?

Sample data样本数据

structure(list(date_received = c("11/30/2021  ", "11/30/2021  ", 
"11/30/2021  ", "11/30/2021  ", "11/30/2021  ", "11/17/2021  ", 
"12/3/2021  ", "12/3/2021  ", "12/13/2021  "), date_approved = c("11/30/2021", 
"11/30/2021", "11/30/2021", "11/30/2021", "11/30/2021", "11/17/2021", 
"12/3/2021", "12/3/2021", "12/3/2021"), sr = c("AP-21-080", "SP-21-081", 
"AP-21-082", "SP-21-083", "MP-21-084", "AP-21-085", "AP-21-086", 
"MP-21-087", "SP-21-088"), permit = c("AP1766856 Classroom C", 
"AP1766858 Classroom A", "AP1766862 Landscape Area", "AP1766864 Classroom B", 
"AO1766867", "06-SE-2420566", "06-E-2425187", "", "06-SM-2424110"
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

Method 1方法一

library(tidyverse)
df$permit_type= df%>% str_split_fixed(df$sr, "-", 2)

# Error
Error in str_split_fixed(., df$sr, "-", 2) : 
  unused argument (2)

Method 2方法二

df$permit_type = df%>% str_extract(sr, "^.{2}")

# Error
Error in str_extract(., sr, "^.{2}") : unused argument ("^.{2}")

Method 3方法三

df = df %>%  mutate(permit_type = str_extract_all(sr, "\\b[a-z]{2}")) 

# Returns permit_type with `Character(0)` values

For the last option, it should be uppercase characters ( [AZ] ) instead of lowercase ( [az] ) as the input 'sr' column shows only uppercase.对于最后一个选项,它应该是大写字符 ( [AZ] ) 而不是小写字符 ( [az] ),因为输入 'sr' 列仅显示大写字母。 In addition, str_extract_all is used when there are multiple occurrences of the pattern and it returns a list ( simplify = FALSE by default).此外, str_extract_all用于模式多次出现并返回list (默认为simplify = FALSE )时使用。 Here, the example showed a single occurence, thus str_extract would be more useful as it returns a vector在这里,该示例显示了一次出现,因此str_extract会更有用,因为它返回一个vector

library(dplyr)
library(stringr)
df %>% 
   mutate(permit_type = str_extract(sr, "\\b[A-Z]{2}"))
# A tibble: 9 × 5
  date_received  date_approved sr        permit                     permit_type
  <chr>          <chr>         <chr>     <chr>                      <chr>      
1 "11/30/2021  " 11/30/2021    AP-21-080 "AP1766856 Classroom C"    AP         
2 "11/30/2021  " 11/30/2021    SP-21-081 "AP1766858 Classroom A"    SP         
3 "11/30/2021  " 11/30/2021    AP-21-082 "AP1766862 Landscape Area" AP         
4 "11/30/2021  " 11/30/2021    SP-21-083 "AP1766864 Classroom B"    SP         
5 "11/30/2021  " 11/30/2021    MP-21-084 "AO1766867"                MP         
6 "11/17/2021  " 11/17/2021    AP-21-085 "06-SE-2420566"            AP         
7 "12/3/2021  "  12/3/2021     AP-21-086 "06-E-2425187"             AP         
8 "12/3/2021  "  12/3/2021     MP-21-087 ""                         MP         
9 "12/13/2021  " 12/3/2021     SP-21-088 "06-SM-2424110"            SP         

With str_split_fixed directly applying on the data, we can wrap the call within {}通过str_split_fixed直接应用于数据,我们可以将调用包装在{}

df%>% 
   {str_split_fixed(.$sr, "-", 2)[,1]} 
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"

Similar issue in the second case第二种情况类似的问题

df%>% 
  {str_extract(.$sr, "^.{2}")}
[1] "AP" "SP" "AP" "SP" "MP" "AP" "AP" "MP" "SP"

in Base R, you could use:在 Base R 中,您可以使用:

transform(df, permit_type = substr(sr,1,2))

  date_received date_approved        sr                   permit permit_type
1  11/30/2021      11/30/2021 AP-21-080    AP1766856 Classroom C          AP
2  11/30/2021      11/30/2021 SP-21-081    AP1766858 Classroom A          SP
3  11/30/2021      11/30/2021 AP-21-082 AP1766862 Landscape Area          AP
4  11/30/2021      11/30/2021 SP-21-083    AP1766864 Classroom B          SP
5  11/30/2021      11/30/2021 MP-21-084                AO1766867          MP
6  11/17/2021      11/17/2021 AP-21-085            06-SE-2420566          AP
7   12/3/2021       12/3/2021 AP-21-086             06-E-2425187          AP
8   12/3/2021       12/3/2021 MP-21-087                                   MP
9  12/13/2021       12/3/2021 SP-21-088            06-SM-2424110          SP

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM