简体   繁体   English

删除具有缺失值的数据框列中的字符串后的所有内容

[英]Remove everything after a string in a data frame column with missing values

I have a data frame resembling the extract below: 我有一个类似于以下摘录的数据框:

Observation Identifier   Value
Obs001      ABC_2001     54
Obs002      ABC_2002     -2
Obs003                   1
Obs004                   1 
Obs005      Def_2001/05  

I would like to transform this data frame into a data frame where portions of the string after the "_" sign would be removed: as illustrated below: 我想将此数据帧转换为数据框,其中“_”符号后面的部分字符串将被删除:如下图所示:

Observation Identifier_NoTime   Value
Obs001      ABC                 54
Obs002      ABC                 -2
Obs003                          1
Obs004                          1 
Obs005      Def  

I tried experimenting with strsplit , gsub and sub as discussed here but cannot force those commends to work. 我试着用这里讨论的strsplitgsubsub实验,但是不能强迫那些strsplit工作。 I have to account for the fact that: 我必须说明以下事实:

  1. Column has missing values and I want to leave them where they are 列缺少值,我想将它们保留在原来的位置
  2. String "_" is located in different places in the variable 字符串“_”位于变量的不同位置
  3. I also want to leave the rest of the data frame the way it is 我还希望保留数据框的其余部分

You could try the below sub command to remove all the non-space characters from _ symbol. 您可以尝试使用以下sub命令从_符号中删除所有非空格字符。

sub("_\\S*", "", string)

Explanation: 说明:

  • _ Matches a literal _ symbol. _匹配文字_符号。
  • \\S* Matches zero or more non-space characters. \\S*匹配零个或多个非空格字符。

OR 要么

This would remove all the characters from _ symbol, 这将删除_符号中的所有字符,

sub("_.*", "", string)

Explanation: 说明:

  • _ Matches a literal _ symbol. _匹配文字_符号。
  • .* Matches any character zero or more times. .*匹配任何字符零次或多次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM