[英]extracting string between characters in a dataframe in r
Hei,嘿,
I have to extract everything that's between "|"我必须提取“|”之间的所有内容in a dataframe.
在数据框中。
I don't think there is the need for reproducible data but this is the first row of the dataframe as an example我认为不需要可重复的数据,但这是数据框的第一行作为示例
Accession FASTA
<chr> <chr>
1 tr|A0A1G4NSV4|A0A1G4NSV4_9FLOR MLNIRPDEISNIIRQQIEKYDQKVQVANVGTVLQVGDGIARVYGLDDVMAGELLEFEDKTIGVALNLESDNVGVVLMGNGRDILEGSSVRATGKIAQIPVGEKFLGRVVNPLAEPIDGKGEINTSDNRLIESSAPGIIGRQSVCEPLQTGITAIDSMIPIGRGQRELIIGDRQTGKTAVALDTIINQKGQDVICV~
2 tr|A0A1C9CHB7|A0A1C9CHB7_PALPL MGNTKVSRRFRAMSELVQDKNYNYTEAIELLRRSSSAKFVETAEAHIVLGLDPKYADQQLRSTVILPKGTGKLAKVAVITKGEKITEALSAGADLVGAEDVIEQILQGNIDFDKLIATPDIMPLIAKLGRVLGPRGLMPSPKAGTVTIDVGQAVQEFKLGKLEYRLDKTGIVHIPFGKVNFSKEDLAANLLAIKE~
3 tr|A0A1C9CHD7|A0A1C9CHD7_PALPL MPHFTLKVLWLENNIAIAIDQIVGKGTSPLTSYFFWPRNDAWEHLKSELESKPWILEIDRINLLNQATEVINYWQEEGKNNSITKAQLKFPDFLFSGSH
4 tr|A0A6C0W2A1|A0A6C0W2A1_PALDE MALYNKKLSPIKKTEVLDYKDIDLLRKFITEQGKILPRRSTGLTSKQQKKLTKAIKQARILALLPFLNKD
5 tr|R7QB42|R7QB42_CHOCR MAFISFPSTFIGTNVKAASFSRRSRSAVRTTPIASAVPRNANLKKLQAGYLFPEIGRRRRAYLEQNPGADIISLGVGDTTMPIPEHICSGLVGGASKLGTEEGYSGYGAEQGMGPLREKIAQVLYKGTVKSDEVFVSDGAKCDISRLQQVFGATATVAVQDPSYPVYVDTSVMMGQTGLYDESKGQFEGIQYMQC~
6 tr|A0A3G1I907|A0A3G1I907_9FLOR MIKKGDVVKITRKESYWYQENGTVIKVESEIKYPVLVRFEKEAYNGVNSNNFAEDEVVVLK
How do I do that?我怎么做?
Assuming each row has an |假设每一行都有一个 | in it
在里面
lapply(strsplit(df$Accession,"|"),"[[",2)
This might also help you.这也可能对您有所帮助。 I only used a single string assuming you know how to apply the code on your data set:
假设您知道如何在数据集上应用代码,我只使用了一个字符串:
(?<=\\\\|)
positive look-behind meaning the desired string should be preceded by a literal |
(?<=\\\\|)
正向后视意味着所需的字符串前面应该有文字|
(?=\\\\|)
positive look-ahead meaning the desired string should be followed by a literl |
(?=\\\\|)
正向预测意味着所需的字符串后面应该跟一个 literl |
Both of these characters are not captured and then:[^|]*
any character aside from a literal |
[^|]*
除文字以外的任何字符|
zero or multiple times.\\vec <- c("tr|A0A1G4NSV4|A0A1G4NSV4_9FLOR")
regmatches(vec, regexpr("(?<=\\|)[^|]*(?=\\|)", vec, perl = TRUE))
[1] "A0A1G4NSV4"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.