简体   繁体   中英

Extracting in words in tags in PYTHON

Hello I'd like to extract the content of this tag

<Sentiment int=6>Deep injustice</Sentiment>

in many sentences of text ( Here ).

df['text'].str.extractall(r'^<(?P<Sentiments>\w+).*[int]?.*(?P<Intensite>\d?\d)>(?P<Expression>[a-zA-Z]*?.*[a-zA-Z]*)<')

My code produce only few of them(tag). Why it do not extract others?

                  Sentiments Intensite               Expression
      match                                                    
405   0         Disagreement         3    Bizarre contradiction
921   0         Satisfaction         5           La plus simple
2549  0      Dissatisfaction         3     Ne me contentant pas

You may use

df['text'].str.extractall(r'<(?P<Sentiments>\w+)\s+int=(?P<Intensite>\d+)>(?P<Expression>[^<]*)')

See the regex demo .

Details

  • < - a < char
  • (?P<Sentiments>\\w+) - Group "Sentiments": 1 or more letters, digits, underscores
  • \\s+ - 1+ whitespace
  • int= - a substring
  • (?P<Intensite>\\d+) - Group "Intensite": 1+ digits
  • > - a > char
  • (?P<Expression>[^<]*) - Group "Expression": 0 or more chars other than >

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM