[英]How to remove a specific character from all the rows in one column of a dataframe
I have a dataframe with two columns and a few hundred rows, let's call it df
which looks like this - 我有一个包含两列和几百行的数据框,我们称它为
df
,它看起来像这样-
Name Chemical_Formula
PALMITYL-COA C37H62N7O17P3S1
CPD0-888 C34H52N7O24P2
3-OXOPALMITOYL-COA C37H60N7O18P3S1
OH-MYRISTOYL C43H75N3O20P2
CPD-19171 C39H64N7O18P3S1
CPD-15253 C52H99N3O13P2
CPD-12122 C75H112O2
CPD0-937 C149H260N2O78P4
.... .....
.... .....
Now if the Chemical_Formula
for some of the compounds ends in 1
I want to remove that 1
from the chemical formula. 现在,如果某些化合物的
Chemical_Formula
以1
结尾,我想从化学式中删除该1
。 For example for the first compound PALMITYL-COA
the chemical formula is C37H62N7O17P3S1
which ends in 1
. 例如,对于第一种化合物
PALMITYL-COA
其化学式为C37H62N7O17P3S1
,其结尾为1
。 So in my new dataframe I want the chemical formula for this first compound to be C37H62N7O17P3S
. 因此,在新数据框中,我希望第一个化合物的化学式为
C37H62N7O17P3S
。
So, my new dataframe should look like this - 因此,我的新数据框应如下所示-
Name Chemical_Formula
PALMITYL-COA C37H62N7O17P3S
CPD0-888 C34H52N7O24P2
3-OXOPALMITOYL-COA C37H60N7O18P3S
OH-MYRISTOYL C43H75N3O20P2
CPD-19171 C39H64N7O18P3S
CPD-15253 C52H99N3O13P2
CPD-12122 C75H112O2
CPD0-937 C149H260N2O78P4
.... .....
.... .....
I want to keep all the Chemical Formulas as it is if they don't end in the number 1
. 如果它们不以数字
1
结尾,我想保留所有化学式。 The ones which end in 1
I just want to remove that 1, keeping the rest of the formula as it is. 以
1
结尾的那些我只想删除那个1,保持公式的其余部分不变。
I was looking for ways to do this using gsub sub grepl
or subset
functions but not quite sure what pattern to give using the regular expression rules. 我一直在寻找使用
gsub sub grepl
或subset
函数执行此操作的方法,但不太确定要使用正则表达式规则指定哪种模式。 Please help! 请帮忙!
Here's how 这是如何做
df$Chemical_Formula <- gsub("1$", "", df$Chemical_Formula)
The dollar sign after the 1 means end of a string. 1后面的美元符号表示字符串的结尾。 Meaning it will only remove a 1 if it is located at the end
表示仅将位于末尾的1删除
Following may help you here. 以下内容可能会对您有所帮助。 Where I am using
sub
substitute function of base R
to remove 1
if it id at end of the element with NULL. 我在哪里使用基
R
sub
替换函数删除1
如果它在元素的结尾为id且为NULL)。
sub("1$","",df$Chemical_Formula)
To save this output into same column use df$Chemical_Formula <-
in above code too. 要将输出保存到同一列中,也可以在上面的代码中使用
df$Chemical_Formula <-
。
Explanation of code: 代码说明:
sub
: sub
is base R
's function which works on method of sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)
sub
: sub
是base R
的函数,可用于sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)
"1$"
: Means telling sub
to act upon only those line which are ending with 1
for df's column named Chemical_Formula
(which I am explaining further this post) "1$"
:表示告诉sub
仅对df名为Chemical_Formula
的列以1
结尾的行进行操作(我将在后面进一步解释)
""
: If above match found in any value then replace line's ending 1
with NULL here as per OP's request. ""
:如果在任何值中都找到上述匹配项,则根据OP的请求,将行的结尾1
替换为NULL。
df$Chemical_Formula
: data frame named df's column named Chemical_Formula
df$Chemical_Formula
:名为df的列,名为Chemical_Formula
数据框
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.