简体   繁体   中英

How to get spesific parts from a text? Python

I have a string like this.

'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'

I want to get only values begin with "up:". Like this:

  • up:A0A0S2Z391
  • up:Q9Y263
  • up:Q9UL54.

How can i do that with python?

By using re module for regular expressions.

import re

text = ''''hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'''
pattern = r'up:.*'
values = re.findall(pattern, text)
print(values)

Output:

['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']

You could use the split() method for that.

Here is a link to the documentation: https://docs.python.org/3/library/stdtypes.html?#str.split

Something like this could work for the string you posted:

s = 'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'
res = []
for i in s.split('up')[1:]:
    res.append('up' + i.split()[0])
print(res)

output:

['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM