删除具有缺失值的数据框列中的字符串后的所有内容

Question

I have a data frame resembling the extract below: 我有一个类似于以下摘录的数据框：

Observation Identifier   Value
Obs001      ABC_2001     54
Obs002      ABC_2002     -2
Obs003                   1
Obs004                   1 
Obs005      Def_2001/05

I would like to transform this data frame into a data frame where portions of the string after the "_" sign would be removed: as illustrated below: 我想将此数据帧转换为数据框，其中“_”符号后面的部分字符串将被删除：如下图所示：

Observation Identifier_NoTime   Value
Obs001      ABC                 54
Obs002      ABC                 -2
Obs003                          1
Obs004                          1 
Obs005      Def

I tried experimenting with strsplit , gsub and sub as discussed here but cannot force those commends to work. 我试着用这里讨论的strsplit ， gsub和sub实验，但是不能强迫那些strsplit工作。 I have to account for the fact that: 我必须说明以下事实：

Column has missing values and I want to leave them where they are 列缺少值，我想将它们保留在原来的位置
String "_" is located in different places in the variable 字符串“_”位于变量的不同位置
I also want to leave the rest of the data frame the way it is 我还希望保留数据框的其余部分

Answer 1

You could try the below sub command to remove all the non-space characters from _ symbol. 您可以尝试使用以下sub命令从_符号中删除所有非空格字符。

sub("_\\S*", "", string)

Explanation: 说明：

_ Matches a literal _ symbol. _匹配文字_符号。
\\S* Matches zero or more non-space characters. \\S*匹配零个或多个非空格字符。

OR 要么

This would remove all the characters from _ symbol, 这将删除_符号中的所有字符，

sub("_.*", "", string)

Explanation: 说明：

_ Matches a literal _ symbol. _匹配文字_符号。
.* Matches any character zero or more times. .*匹配任何字符零次或多次。

删除具有缺失值的数据框列中的字符串后的所有内容

问题描述

1 个解决方案

解决方案1
9 已采纳 2014-10-28 15:23:48

删除具有缺失值的数据框列中的字符串后的所有内容

问题描述

1 个解决方案

解决方案1 9 已采纳 2014-10-28 15:23:48

解决方案1
9 已采纳 2014-10-28 15:23:48