So I have this text file that contains the following.
<lang:Foreign> <lang:foreign>
</lang:Foreign> </lang:foreign>
<lang: Foreign> <lang:foreign>
</lang: Foreign> </lang:foreign>
What my program do is it maps the first text in the line to the second. So it would look like this in the dictionary.
{<lang:Foreign> : <lang:foreign>}
flist = [line.split() for line in f]
for k, v in flist:
fdict.update({k: v})
My mapping code is above. But the problem is the last two lines of entries
<lang: Foreign> <lang:foreign>
</lang: Foreign> </lang:foreign>
The first entries have space between them and my code splits lang: and Foreign. But I want to specify that the first entry contains a space. I have tried doing the following
<lang:\sForeign> <lang:foreign>
</lang:\sForeign> </lang:foreign>
Any idea how I can tell my program to accept this space and map it properly? Thanks!
Just use different split argument. This should work for you:
line.split(' <')
I would suggest using regex. Using the following pattern matching will give you a list of matching patterns enclosed in '<>' for each line.
import re
pattern = re.compile(r'<.*?>')
flist = pattern.findall(line) # sample output of flist = ['<lang:Foreign>', '<lang:foreign>']
if len(flist) == 2:
fdict.update({flist[0]: flist[1]})
I would suggest that you split on "> <" and then add the ">" and "<" back to the first and second elements of the array. Something like this ...
arr = line.split('> <')
arr[0] = arr[0] + '>'
arr[1] = '<' + arr[1]
Using regular expressions probably makes the most sense here.
import re
pattern = re.compile(r'(<.*?>)\s*(<.*?>)')
flist = [pattern.findall(line) for line in f]
for k, v in flist:
fdict.update({k: v})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.