简体   繁体   中英

re.sub with brackets, removing Japanese ruby characters

How can I change

a = "[ruby(空,ruby=そら)]は[ruby(青,ruby=あお)]い。"

into

"空は青い。" ?

I tried

re.sub(r"\[ruby\(.,ruby=.\)\]",".",a)

but not working at all.

Given:

a = "[ruby(空,ruby=そら)]は[ruby(青,ruby=あお)]い。"
desired="空は青い。"

You can use alteration to remove the sub strings:

>>> s=re.sub(r'\[ruby\(|,ruby=[^)]+\)\]','',a)
>>> s
空は青い。
>>> s==desired
True

You can use

a = re.sub(r'\[ruby\(([^(),]*),[^()]*\)]', r'\1', a)

See the regex demo . Details:

  • \[ruby\( - a [ruby( text
  • ([^(),]*) - Group 1: any text other than ( , ) and a comma, zero or more occurrences
  • , - a comma
  • [^()]* - zero or more chars other than ( and )
  • \)] - a )] text.

See a Python demo :

import re
a = "[ruby(空,ruby=そら)]は[ruby(青,ruby=あお)]い。"
print( re.sub(r'\[ruby\(([^(),]*),[^()]*\)]', r'\1', a) )
# => 空は青い。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM