简体   繁体   English

将gsub命令从R转换为Python

[英]translating gsub command from R into Python

I want to split this mystring="0G15^GAC0T60T4^AA0C0" and get the following output with python: 我想拆分此mystring =“ 0G15 ^ GAC0T60T4 ^ AA0C0”并使用python获取以下输出:

['0','G','15','^GAC','T','60','T','4','^AA','C']

It could be done with this command in R: 可以使用R中的以下命令来完成:

mystring <- "0G15^GAC0T60T4^AA0C0"
gsub("([\\^]*[ACGT]+)[0]*", " \\1 ", mystring)

how can I translate the R script into python? 如何将R脚本翻译成python?

Thanks 谢谢

You can reuse your existing regular expression using Pythons re module 您可以使用Pythons re模块重用现有的正则表达式

import re

mystring = "0G15^GAC0T60T4^AA0C0"
l = re.sub("([\\^]*[ACGT]+)[0]*", " \\1 ", mystring).split()

l is then l

['0', 'G', '15', '^GAC', 'T', '60', 'T', '4', '^AA', 'C']

You can try this: 您可以尝试以下方法:

mystring="0G15^GAC0T60T4^AA0C0" 
import re
new_data = re.findall('(?<!\^[GAC])\d+|(?<!\^)\w|\^[a-zA-Z]+', mystring)
final_data = [a for i, a in enumerate(new_data) if a != '0' or not new_data[i-1].startswith("^")][:-1]

Output: 输出:

['0', 'G', '15', '^GAC', 'T', '60', 'T', '4', '^AA', 'C']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM