I have the following code:
result = dict()
for i in ['ABC', 'DEF']:
result[i] = re.findall('{0}(.*?)(\d+\.*\d*-*\d+\.*\d*.*?)'.format(i), 'ABC costs 40000-50000 dollars; the price of car DEF is 45600-80000, HIJ only needs 30000USD')
It returns:
{'ABC': [(' costs ', '40000-50000')], 'DEF': [(' is ', '45600-80000')]}
However, I want the following:
{'ABC': ['40000-50000'], 'DEF': ['45600-80000'], 'OTHERS' : ['30000']}
Note that keywords not equal ABC
and DEF
are regarded as OTHERS
. How to solve the problem?
This approach does a top line find all to get all three letter abbreviations along with numeric values/ranges. Then it uses a list comprehension combined with zip
and dict
to generate the dictionary you want.
inp = "ABC costs 40000-50000 dollars; the price of car DEF is 45600-80000, HIJ only needs 30000USD"
matches = re.findall(r'\b[A-Z]{3}\b|\d+(?:-\d+)?', inp)
print(matches)
map_out = dict(zip([matches[i] for i in range(0, len(matches), 2)],
[matches[i] for i in range(1, len(matches), 2)]))
print(map_out)
This prints:
['ABC', '40000-50000', 'DEF', '45600-80000', 'HIJ', '30000']
{'HIJ': '30000', 'ABC': '40000-50000', 'DEF': '45600-80000'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.