如何从数据框的一列中的所有行中删除特定字符

Question

I have a dataframe with two columns and a few hundred rows, let's call it df which looks like this - 我有一个包含两列和几百行的数据框，我们称它为df ，它看起来像这样-

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S1
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S1
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S1
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

Now if the Chemical_Formula for some of the compounds ends in 1 I want to remove that 1 from the chemical formula. 现在，如果某些化合物的Chemical_Formula以1结尾，我想从化学式中删除该1 。 For example for the first compound PALMITYL-COA the chemical formula is C37H62N7O17P3S1 which ends in 1 . 例如，对于第一种化合物PALMITYL-COA其化学式为C37H62N7O17P3S1 ，其结尾为1 。 So in my new dataframe I want the chemical formula for this first compound to be C37H62N7O17P3S . 因此，在新数据框中，我希望第一个化合物的化学式为C37H62N7O17P3S 。

So, my new dataframe should look like this - 因此，我的新数据框应如下所示-

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

I want to keep all the Chemical Formulas as it is if they don't end in the number 1 . 如果它们不以数字1结尾，我想保留所有化学式。 The ones which end in 1 I just want to remove that 1, keeping the rest of the formula as it is. 以1结尾的那些我只想删除那个1，保持公式的其余部分不变。

I was looking for ways to do this using gsub sub grepl or subset functions but not quite sure what pattern to give using the regular expression rules. 我一直在寻找使用gsub sub grepl或subset函数执行此操作的方法，但不太确定要使用正则表达式规则指定哪种模式。 Please help! 请帮忙！

Answer 1

Here's how 这是如何做

df$Chemical_Formula <- gsub("1$", "", df$Chemical_Formula)

The dollar sign after the 1 means end of a string. 1后面的美元符号表示字符串的结尾。 Meaning it will only remove a 1 if it is located at the end 表示仅将位于末尾的1删除

Answer 2

Following may help you here. 以下内容可能会对您有所帮助。 Where I am using sub substitute function of base R to remove 1 if it id at end of the element with NULL. 我在哪里使用基R sub替换函数删除1如果它在元素的结尾为id且为NULL）。

sub("1$","",df$Chemical_Formula)

To save this output into same column use df$Chemical_Formula <- in above code too. 要将输出保存到同一列中，也可以在上面的代码中使用df$Chemical_Formula <- 。

Explanation of code: 代码说明：

sub : sub is base R 's function which works on method of sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable) sub ： sub是base R的函数，可用于sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)

"1$" : Means telling sub to act upon only those line which are ending with 1 for df's column named Chemical_Formula (which I am explaining further this post) "1$" ：表示告诉sub仅对df名为Chemical_Formula的列以1结尾的行进行操作（我将在后面进一步解释）

"" : If above match found in any value then replace line's ending 1 with NULL here as per OP's request. "" ：如果在任何值中都找到上述匹配项，则根据OP的请求，将行的结尾1替换为NULL。

df$Chemical_Formula : data frame named df's column named Chemical_Formula df$Chemical_Formula ：名为df的列，名为Chemical_Formula数据框

如何从数据框的一列中的所有行中删除特定字符

问题描述

2 个解决方案

解决方案1
3 2018-07-28 18:32:19

解决方案2
2 2018-07-28 18:30:23

如何从数据框的一列中的所有行中删除特定字符

问题描述

2 个解决方案

解决方案1 3 2018-07-28 18:32:19

解决方案2 2 2018-07-28 18:30:23

解决方案1
3 2018-07-28 18:32:19

解决方案2
2 2018-07-28 18:30:23