简体   繁体   中英

Regex match optional set of characters not working

I'm trying to capture the username from these lines:

title="user1 is online now"><b><font color="#2568BA"><b>user1</b></font></b></a>
title="user2 is online now"><b>user2</b></a>

With this as the pattern:

title=".{1,16} is \w{5,8}? now"><b>(?:<font color="#\w{6}">)<b>(?<text>.+?)</b>(?:</font>)</b></a>?

But it's only capturing user1. The "font color" tag needs to be ignored, sometimes it's there sometimes it's not.

I'm struggling with this for hours now, what am I missing?

The following might work.

  • Assume that username follows title=" and is followed by is on(or off)line
  • capture that instance into capturing group 1
  • use a back reference to find the last instance of username in the line
  • capture that into named capturing group UserName

title="(\S+)(?= is (?:on|off)line).*(?<UserName>\k<1>)

If you wanted to, you could also capture the on or off line status.

对于这些示例,这应该起作用:

title="\S+\sis\s(?:on|off)line\snow">(?:<b><font[^>]+>)?<b>(.*?)</b>

您可以使用以下正则表达式:

<[^>]*>(user\d+)<[^>]*>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM