python split string into strings with same language characters

Question

I want split strings like "hiسلامaliعلی" into ["hi", "سلام", "ali", "علی"] .

the initial string contains only english and persian characters (with or without space) and I want to split it into continues same language characters.

is there an easy way to extract continues english character from string and split remaingin characters?

Answer 1

You can split on ASCII letters with re.split() :

re.split(r'([a-zA-Z]+)', inputstring)

Demo with Python 3:

>>> inputstring = "hiسلامaliعلی"
>>> re.split(r'([a-zA-Z]+)', inputstring)
['', 'hi', 'سلام', 'ali', 'علی']

Extending this to the full Latin-1 range:

re.split(r'([a-zA-Z\xC0-\xFF]+)', inputstring)

For Python 2, do make sure you use unicode strings and prefix the regular expression with u :

re.split(ur'([a-zA-Z\xC0-\xFF]+)', inputstring)

In all cases, if the Latin text is at the start or end, an empty string is inserted as the string is split; you can remove these with:

result = [s for s in re.split(r'([a-zA-Z\xC0-\xFF]+)', inputstring) if s]

python split string into strings with same language characters

Question

1 answers

solution1
5 ACCPTED 2014-08-06 08:12:37

python split string into strings with same language characters

Question

1 answers

solution1 5 ACCPTED 2014-08-06 08:12:37

solution1
5 ACCPTED 2014-08-06 08:12:37