简体   繁体   中英

regex return everything up to the first space after nth character

I have a list of product names and I want to shorten them (Short Name). I need a regex that will return the first word if it is more than 5 characters and the first two words if it is 5 characters or less.

Product Name            Short Name
BABY WIPES MIS /ALOE    BABY WIPES
PKU GEL PAK             PKU GEL
CA ASCORBATE TAB 500MG  CA ASCORBATE
SOD SUL/SULF CRE 10-2%  SOD SUL/SULF
ASPIRIN TAB 81MG EC     ASPIRIN
IRON TAB 325MG          IRON TAB
PEDA                    PEDA

I initially used:

^([^ \t]+).*

but it only returns the first word so BABY WIPES MIS /ALOE would be BABY. I then tried:

.....([^ \t]+)

But this appears to not work for names less than 5 characters. Any help would be greatly appreciated.

Brief

Your try is close, however, since you negated spaces and tabs, you were unable to move past the first word.


Code

See code in use here

^(\S{1,5}[ \t]*?\S+).*$

Note: The link uses the following shortened regex. \\h may not work in your flavour of regex, which is why the code above is posted as well.

^(\S{1,5}\h*?\S+).*$

Super-simplified it becomes ^\\S{1,5}\\h*?\\S+ (without capture groups and .*$ as the OP initially used.)


Results

Input

BABY WIPES MIS /ALOE
PKU GEL PAK
CA ASCORBATE TAB 500MG
SOD SUL/SULF CRE 10-2%
ASPIRIN TAB 81MG EC
IRON TAB
PEDA

Output

BABY WIPES
PKU GEL
CA ASCORBATE
SOD SUL/SULF
ASPIRIN
IRON TAB
PEDA

Explanation

  • ^ Assert position at the start of a line
  • (\\S{1,5}[ \\t]*?\\S+) Capture group doing the following
    • \\S{1,5} Match any non-whitespace character between 1 and 5 times
    • [ \\t]*? Match space or tab characters any number of times, but as few as possible (note in PCRE regex, this can be replaced with \\h*? to make it shorter)
    • \\S+ Match any non-whitespace character between one and unlimited times
  • .* Match any character (except newline character assuming s modifier is off - it should be for this problem)
  • $ Assert position at the end of a line

You can use a regex like this:

^\S{1,5} \S+|^\S+
or
^\S{1,5} ?\S*

Working demo

在此处输入图片说明

By the way, if you want to replace a full line with the shortened version, then you can use this regex instead:

(^\S{1,5} \S+|^\S+).*
or
(^\S{1,5} ?\S*).*

With the replacement string $1 or \\1 depending on your regex engine.

Working demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM