简体   繁体   English

如何从数据框的一列中的所有行中删除特定字符

[英]How to remove a specific character from all the rows in one column of a dataframe

I have a dataframe with two columns and a few hundred rows, let's call it df which looks like this - 我有一个包含两列和几百行的数据框,我们称它为df ,它看起来像这样-

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S1
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S1
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S1
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

Now if the Chemical_Formula for some of the compounds ends in 1 I want to remove that 1 from the chemical formula. 现在,如果某些化合物的Chemical_Formula1结尾,我想从化学式中删除该1 For example for the first compound PALMITYL-COA the chemical formula is C37H62N7O17P3S1 which ends in 1 . 例如,对于第一种化合物PALMITYL-COA其化学式为C37H62N7O17P3S1 ,其结尾为1 So in my new dataframe I want the chemical formula for this first compound to be C37H62N7O17P3S . 因此,在新数据框中,我希望第一个化合物的化学式为C37H62N7O17P3S

So, my new dataframe should look like this - 因此,我的新数据框应如下所示-

Name                 Chemical_Formula
PALMITYL-COA         C37H62N7O17P3S
CPD0-888             C34H52N7O24P2
3-OXOPALMITOYL-COA   C37H60N7O18P3S
OH-MYRISTOYL         C43H75N3O20P2
CPD-19171            C39H64N7O18P3S
CPD-15253            C52H99N3O13P2
CPD-12122            C75H112O2
CPD0-937             C149H260N2O78P4
....                 .....
....                 .....

I want to keep all the Chemical Formulas as it is if they don't end in the number 1 . 如果它们不以数字1结尾,我想保留所有化学式。 The ones which end in 1 I just want to remove that 1, keeping the rest of the formula as it is. 1结尾的那些我只想删除那个1,保持公式的其余部分不变。

I was looking for ways to do this using gsub sub grepl or subset functions but not quite sure what pattern to give using the regular expression rules. 我一直在寻找使用gsub sub greplsubset函数执行此操作的方法,但不太确定要使用正则表达式规则指定哪种模式。 Please help! 请帮忙!

Here's how 这是如何做

df$Chemical_Formula <- gsub("1$", "", df$Chemical_Formula)

The dollar sign after the 1 means end of a string. 1后面的美元符号表示字符串的结尾。 Meaning it will only remove a 1 if it is located at the end 表示仅将位于末尾的1删除

Following may help you here. 以下内容可能会对您有所帮助。 Where I am using sub substitute function of base R to remove 1 if it id at end of the element with NULL. 我在哪里使用基R sub替换函数删除1如果它在元素的结尾为id且为NULL)。

sub("1$","",df$Chemical_Formula)

To save this output into same column use df$Chemical_Formula <- in above code too. 要将输出保存到同一列中,也可以在上面的代码中使用df$Chemical_Formula <-

Explanation of code: 代码说明:

sub : sub is base R 's function which works on method of sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable) subsub是base R的函数,可用于sub(regex_needs_to_be_used_to_replace_present_content,"with_new_content",variable)

"1$" : Means telling sub to act upon only those line which are ending with 1 for df's column named Chemical_Formula (which I am explaining further this post) "1$" :表示告诉sub仅对df名为Chemical_Formula的列以1结尾的行进行操作(我将在后面进一步解释)

"" : If above match found in any value then replace line's ending 1 with NULL here as per OP's request. "" :如果在任何值中都找到上述匹配项,则根据OP的请求,将行的结尾1替换为NULL。

df$Chemical_Formula : data frame named df's column named Chemical_Formula df$Chemical_Formula :名为df的列,名为Chemical_Formula数据框

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 R 中的 dataframe 行中删除特定字符 - how to remove specific character from rows of dataframe in R R从数据框中选择所有行,在该数据框中,一个值重复一列,但在另一列中具有特定值 - R select all rows from a dataframe where a value is duplicated one column but has a specific value in another column 如何从R中的列中删除特定字符 - How to remove specific character from column in R 考虑到数据框的一列中存在重复项,如何删除数据框中的行 - How to remove rows in a dataframe considering there are duplicates in one column of dataframe 如何删除R中数据框列中字符串中的所有NA? - How to remove all NAs in character strings in a dataframe column in R? R 根据条件从 dataframe 中删除所有具有特定 ID 的行 - R Remove all rows from dataframe with specific ID based on conditional 从所有行中删除一个字符 - Remove a character from all rows 如何使用 R 中的 Tidyverse 从 dataframe 中的特定列中删除行 Inf? - How to remove rows Inf from a specific column in a dataframe using Tidyverse in R? 如何从 dataframe `df` 中删除重复的行,但仅当 `df` 的特定列为 NA 时? - How to remove duplicated rows from a dataframe `df` but only when a specific column of the `df` is NA? 从 dataframe 中删除一列中具有无限值但其他列没有的行 - Remove rows from dataframe that have an infinite value in one column, but not others
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM