I have strings that look like this:
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
Desired output:
splitted = ['The dog went for a walk', 'El perro fue de paseo']
Current code:
splitted = re.split("^@:$", sentences)
So, id like to split the sentences based on characters beginning with an add symbol @ and ending with a colon : , as these are the way all languages are encoded, eg (@en:, @es:, @fr:, @nl: etc.)
You can split on from @ to: without matching any of those chars in between using a negated character class.
There might be empty entries in the result, which you can filter out.
@[^@:]*:
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = [s for s in re.split("@[^@:]*:", sentences) if s]
print(splitted)
Output
['The dog went for a walk', 'El perro fue de paseo']
hello try this code it will help you
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = re.split(r"@[a-zA-z]+:",sentences)
print(splitted)
You need this regex: @[^@:]+:
first, @
match a @
next, [^@:]+
match any number of characters (minimum one) that are not @
or :
finally, :
match a :
import re
sentences = "@en:The dog went for a walk@es:El perro fue de paseo"
splitted = re.split("@[^@:]+:", sentences)
print(splitted[1:])
output:
['The dog went for a walk', 'El perro fue de paseo']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.