Hello I'd like to extract the content of this tag
<Sentiment int=6>Deep injustice</Sentiment>
in many sentences of text ( Here ).
df['text'].str.extractall(r'^<(?P<Sentiments>\w+).*[int]?.*(?P<Intensite>\d?\d)>(?P<Expression>[a-zA-Z]*?.*[a-zA-Z]*)<')
My code produce only few of them(tag). Why it do not extract others?
Sentiments Intensite Expression
match
405 0 Disagreement 3 Bizarre contradiction
921 0 Satisfaction 5 La plus simple
2549 0 Dissatisfaction 3 Ne me contentant pas
You may use
df['text'].str.extractall(r'<(?P<Sentiments>\w+)\s+int=(?P<Intensite>\d+)>(?P<Expression>[^<]*)')
See the regex demo .
Details
<
- a <
char (?P<Sentiments>\\w+)
- Group "Sentiments": 1 or more letters, digits, underscores \\s+
- 1+ whitespace int=
- a substring (?P<Intensite>\\d+)
- Group "Intensite": 1+ digits >
- a >
char (?P<Expression>[^<]*)
- Group "Expression": 0 or more chars other than >
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.