简体   繁体   中英

Regex: matching from a specified word to just before a specified word

The following are my strings:

  1. NM_001002.4 Homo sapiens ribosomal protein lateral stalk subunit P0 (RPLP0) transcript variant 1, mRNA
  2. NM_181528.3 Homo sapiens N-alpha-acetyltransferase 20, NatB catalytic subunit (NAA20), transcript variant 3, mRNA

I am using the following regex command in python: (Homo sapiens [^,]*)

outcome: Homo sapiens ribosomal protein lateral stalk subunit P0 (RPLP0) transcript variant 1 Homo sapiens N-alpha-acetyltransferase 20

expected outcomes are:

Homo sapiens ribosomal protein lateral stalk subunit P0 (RPLP0) transcript variant 1

Homo sapiens N-alpha-acetyltransferase 20, NatB catalytic subunit (NAA20), transcript variant 3

Kindly help me. Thanks in advance!

  1. If 'transcript variant' is always present in the data:

     (Homo sapiens.* transcript variant [^,]*)
  2. If ', mRNA' is always present in the data:

     (Homo sapiens.*)(?:, mRNA)

    and get group1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM