Suppose I have the following string:
data = """
Pakistan[country]
Karachi
lahore
islamabad
UAE[country]
dubai
sharjah
India[country]
goa
chennai
"""
How to use itertools.groupby
here to have a dict (with the countries as keys) and their corresponding cities? The closest I have come to is
from itertools import groupby
filtered = (line for line in data.split("\n") if line)
for key, values in groupby(filtered, lambda line: line.endswith('[country]')):
print(key)
print(list(values))
However, how to group the result properly? I am not interested in other possible solutions (I have written a generator function myself) but want to explicitly use/understand itertools.groupby
.
{'Pakistan': ['Karachi', 'lahore', 'islamabad']}
{'UAE': ['dubai', 'sharjah']}
{'India': ['goa', 'chennai']}
Which yields
{'Pakistan': ['Karachi', 'lahore', 'islamabad']} {'UAE': ['dubai', 'sharjah']} {'India': ['goa', 'chennai']}
I think groupby
is the wrong tool for this. That's because it collects all successive items that have the same result when the key-function is applied to them. However from the problem description it seems more like you want to "split" your list when the function returns true.
However if you really want/must do it with groupby
then there would be (conceptually) two approaches:
One possible way would be to collect pairs from the groupby
result. So you collect the one which gave true and the following ones that returned False:
>>> filtered = (line for line in data.split("\n") if line)
>>> l = [list(g) for _, g in groupby(filtered, lambda line: line.endswith('[country]'))]
>>> d = {l[i*2][0].split('[')[0]: l[i*2+1] for i in range(len(l) // 2)}
>>> d
{'Pakistan': ['Karachi', 'lahore', 'islamabad'],
'UAE': ['dubai', 'sharjah'],
'India': ['goa', 'chennai']}
Or some sort of stateful container as function which remembers what the "current country" is:
class KeepCountry:
def __call__(self, item):
if item.endswith('[country]'):
self._last = item.split('[country]')[0]
return self._last
>>> filtered = (line for line in data.split("\n") if line)
>>> {k: list(g)[1:] for k, g in groupby(filtered, KeepCountry())}
{'Pakistan': ['Karachi', 'lahore', 'islamabad'],
'UAE': ['dubai', 'sharjah'],
'India': ['goa', 'chennai']}
Both solutions assume quite a few things - just in case you want to use any of these:
Just in case a third-party package might be acceptable then you could use iteration_utilities
(my library) which provides a split
-function for iterables:
>>> from iteration_utilities import Iterable
>>> (Iterable(data.split('\n'))
... .filter(bool) # Removes empty lines
... # Split by countries while keeping them
... .split(lambda l: l.endswith('[country]'), keep_after=True)[1:]
... # Convert to a tuple containing the country as first and the cities as second element
... .map(lambda l: (l[0][:-9], l[1:]))
... .as_dict())
{'Pakistan': ['Karachi', 'lahore', 'islamabad'],
'UAE': ['dubai', 'sharjah'],
'India': ['goa', 'chennai']}
Not sure about itertools but why not:
from collections import defaultdict
data = """
Pakistan[country]
Karachi
lahore
islamabad
UAE[country]
dubai
sharjah
India[country]
goa
chennai
"""
dct = defaultdict(list)
country = ''
for x in data.split('\n')[1:-1]:
if '[country]' in x:
country = x.replace('[country]', '')
else:
dct[country].append(x)
print(dct)
# {'Pakistan': ['Karachi', 'lahore', 'islamabad'], 'UAE': ['dubai', 'sharjah'], 'India': ['goa', 'chennai']}
itertools.groupby()
will return an alternating sequence of countries and cities. When it returns a country, you save the country. When it returns cities, you add an entry to the dictionary with the saved country.
result = {}
for is_country, values in itertools.groupby(filtered, key = lambda line: line.endswith("[country]")):
if is_country:
country = next(values)
else:
result[country] = list(values)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.