I'd like to read some values from sample.cfg file and parse them. The code looks like this :
from pyparsing import *
key = Word(alphanums)('key')
equals = Suppress('=')
value = Word(alphanums)('value')
kvexpression = key + equals + value
with open('sample.cfg') as config_in:
config_data = config_in.read()
for match in kvexpression.scanString(config_data):
result = match[0]
print("{0} is {1}".format(result.key, result.value))
If I use ASCII characters it works fine. Like this :
sample.cfg
city=Atlanta
state=Georgia
population=5522942
But if I use some unicode characters in the input file. It doesn't works as expected.
sample.cfg (with unicode letters)
şehir=İzmir
ülke=Türkiye
nüfus=4279677
If you run this program its output is like this:
lke is T
fus is 4279677
As you'd see it ignores unicode characters.
Update :
I altered the code as suggested. Now it became like this :
from pyparsing import*
key = Word(alphanums + alphas8bit)('key')
equals = Suppress('=')
value = Word(alphanums + alphas8bit)('value')
kvexpression = key + equals + value
with open('şehir.cfg') as config_in:
config_data = config_in.read()
for match in kvexpression.scanString(config_data):
result = match[0]
print("{0} is {1}".format(result.key, result.value))
And minor changes in data file :
sample.cfg
şehir=İzmir
ülke=Türkiye
nüfus=4279677
alfabe=AaBbCcÇçDdEeFfGgĞğHhIiİiJjKkLlMmNnOoÖöPpRrSsŞşTtUuÜüVvYyZz
When I run the program it's output is like this.
ülke is Türkiye
nüfus is 4279677
alfabe is AaBbCcÇçDdEeFfGg
As you'd see the first line which starts with accented s 'ş' is not displayed. I noticed this situation before.
Almost there, yet not quite.
I use a linux box.
Replace alphanums
with alphanums+alphas8bit
in two places in your code, as in this line.
key = Word(alphanums+alphas8bit)('key')
The problem is that alphanums
matches only the unaccented Latin alphabet (plus the numerical digits). alphas8bit
matches the additional 8-bit characters in Latin-1.
When I run the altered code against this input,
sehir=Izmir
ülke=Türkiye
nüfus=4279677
AaBbCcÇçDdEeFfGgGgHhIiIiJjKkLlMmNnOoÖöPpRrSsSsTtUuÜüVvYyZz = 5
where the entire Turkish alphabet appears in the last line, the result is,
sehir is Izmir
ülke is Türkiye
nüfus is 4279677
AaBbCcÇçDdEeFfGgGgHhIiIiJjKkLlMmNnOoÖöPpRrSsSsTtUuÜüVvYyZz is 5
I've find a solution by my self. I don't know whether it is the convenient way to achieve this. But it looks fine to me.
from pyparsing import*
alphanums_tr = u'abcçdefgğhiijklmnoöprsştuüvyzABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ0123456789'
key = Word(alphanums_tr)('key')
equals = Suppress('=')
value = Word(alphanums_tr)('value')
kvexpression = key + equals + value
with open('şehir.cfg') as config_in:
config_data = config_in.read()
for match in kvexpression.scanString(config_data):
result = match[0]
print("{0} is {1}".format(result.key, result.value))
Program's output is like this :
şehir is İzmir
ülke is Türkiye
nüfus is 4279677
alfabe is AaBbCcÇçDdEeFfGgĞğHhIiİiJjKkLlMmNnOoÖöPpRrSsŞşTtUuÜüVvYyZz
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.