简体   繁体   中英

Python Dictionary Mapping Not Working as Expected

So I have this text file that contains the following.

<lang:Foreign> <lang:foreign>
</lang:Foreign> </lang:foreign>
<lang: Foreign> <lang:foreign>
</lang: Foreign> </lang:foreign>

What my program do is it maps the first text in the line to the second. So it would look like this in the dictionary.

{<lang:Foreign> : <lang:foreign>}
flist = [line.split() for line in f]
for k, v in flist:
    fdict.update({k: v})

My mapping code is above. But the problem is the last two lines of entries

<lang: Foreign> <lang:foreign>
</lang: Foreign> </lang:foreign>

The first entries have space between them and my code splits lang: and Foreign. But I want to specify that the first entry contains a space. I have tried doing the following

<lang:\sForeign> <lang:foreign>
</lang:\sForeign> </lang:foreign>

Any idea how I can tell my program to accept this space and map it properly? Thanks!

Just use different split argument. This should work for you:

line.split(' <')

I would suggest using regex. Using the following pattern matching will give you a list of matching patterns enclosed in '<>' for each line.

    import re

    pattern = re.compile(r'<.*?>')
    flist = pattern.findall(line) # sample output of flist = ['<lang:Foreign>', '<lang:foreign>']
    if len(flist) == 2:
        fdict.update({flist[0]: flist[1]})

I would suggest that you split on "> <" and then add the ">" and "<" back to the first and second elements of the array. Something like this ...

arr = line.split('> <')
arr[0] = arr[0] + '>'
arr[1] = '<' + arr[1]

Using regular expressions probably makes the most sense here.

import re

pattern = re.compile(r'(<.*?>)\s*(<.*?>)')

flist = [pattern.findall(line) for line in f]
for k, v in flist:
    fdict.update({k: v})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM