从data.frame中的列中提取最后2个字符

Question

I am new to R programming and have searched SO for many hours. 我是R编程的新手，已经搜索了很多个小时。 I would appreciate your help. 我很感激你的帮助。

I have a dataframe, with 3 columns (Date,Description, Debit) 我有一个数据框，有3列（日期，描述，借记）

      Date         Description   Debit
2014-01-01      "abcdef    VA"      15
2014-01-01     "ghijkl"    NY"      56

I am trying to extract the last 2 chars of the second (Description) column (ie the 2 letter state abbreviation). 我试图提取第二个（描述）列的最后2个字符（即2个字母的州名缩写）。 I am not very comfortable with apply-type functions. 我对应用类型函数不太满意。

I have tried using 我试过用

 l <- lapply(a$Description, function(x) {substr(x, nchar(x)-2+1, nchar(x))})

but get the following error message 但得到以下错误消息

Error in nchar(x) : invalid multibyte string, element 1

I have tried multiple other approaches, but with the same error. 我尝试了多种其他方法，但有相同的错误。

I am quite sure that I am missing something very basic, so would appreciate your help 我很确定我遗漏了一些非常基本的东西，所以非常感谢你的帮助

thanks 谢谢

Answer 1

library(stringr)
str_sub(a$Description,-2,-1)

Answer 2

df <- data.frame(date = c("2015-01-01", "2015-02-01", "2015-01-15"),
             jumble = c("12345 VA", "123 FL", "12354567732 GA"),
             debit = c(15, 36, 20))

df$jumble <- as.character(df$jumble)

df$state <- substr(df$jumble, nchar(df$jumble)-1, nchar(df$jumble))

df
        date         jumble debit state
1 2015-01-01       12345 VA    15    VA
2 2015-02-01         123 FL    36    FL
3 2015-01-15 12354567732 GA    20    GA

Answer 3

Here's a regex version, using Brandon S's sample data. 这是一个正则表达式版本，使用Brandon S的样本数据。 The regex captures everything after the last whitespace character to the end of the string. 正则表达式捕获最后一个空白字符后的所有内容到字符串的结尾。

df <- data.frame(date = c("2015-01-01", "2015-02-01", "2015-01-15"),
                 jumble = c("12345 VA", "123 FL", "12354567732 GA"),
                 debit = c(15, 36, 20))

df$state <- gsub(".+\\s(.+)$", "\\1", df$jumble)

df

        date         jumble debit state
1 2015-01-01       12345 VA    15    VA
2 2015-02-01         123 FL    36    FL
3 2015-01-15 12354567732 GA    20    GA

Answer 4

We can use sub 我们可以使用sub

df$State <- sub(".*\\s+", "", df[,2])
df$State
#[1] "VA" "FL" "GA"

Answer 5

A more elegant way: 更优雅的方式：

df['Description'].str[-2:]

I assume that your description column is of String type (or Object type). 我假设您的描述列是String类型（或对象类型）。

从data.frame中的列中提取最后2个字符

问题描述

5 个解决方案

解决方案1
5 已采纳 2016-05-02 23:19:49

解决方案2
0 2016-05-02 23:20:12

解决方案3
0 2016-05-03 01:53:23

解决方案4
0 2016-05-03 02:09:22

解决方案5
0 2018-11-11 21:26:24

从data.frame中的列中提取最后2个字符

问题描述

5 个解决方案

解决方案1 5 已采纳 2016-05-02 23:19:49

解决方案2 0 2016-05-02 23:20:12

解决方案3 0 2016-05-03 01:53:23

解决方案4 0 2016-05-03 02:09:22

解决方案5 0 2018-11-11 21:26:24

解决方案1
5 已采纳 2016-05-02 23:19:49

解决方案2
0 2016-05-02 23:20:12

解决方案3
0 2016-05-03 01:53:23

解决方案4
0 2016-05-03 02:09:22

解决方案5
0 2018-11-11 21:26:24