简体   繁体   中英

How to convert CSV data into a dictionary using itertools.groupby

I have a text file, job.txt, which is below

job,salary
Developer,29000
Developer,28000
Tester,27000
Tester,26000

My code is

with open(r'C:\Users\job.txt') as f:
    file_content = f.readlines()
data = {}
for i, line in enumerate(file_content):
    if i == 0:
        continue
    job, salary = line.split(",")
    job = job.strip()
    salary = int(salary.strip())
    if not job in data:
        data[job] = []
    data[job].append(salary)
print("data =", data)

My expected result is below

data = {'Developer': [29000, 28000], 'Tester': [27000, 26000]}

How can I convert my code to use itertools.groupby ?

Here is the code that will generate the dictionary you wanted.

from itertools import groupby

data = [
    ["Developer",29000],
    ["Developer",28000],
    ["Tester",27000],
    ["Tester",26000]
]

def keyfunc(e):
    return e[0]

unique_keys = {}
data = sorted(data, key=keyfunc)

for k, g in groupby(data, keyfunc):
    unique_keys[k] = [i[1] for i in g]


>>> print(unique_keys)
{'Developer': [29000, 28000], 'Tester': [27000, 26000]}

PS: I would suggest using the csv module to read the file instead of doing it yourself.

Try this if pandas is an option:

from collections import defaultdict
import pandas as pd

d = pd.read_csv('job.txt').to_numpy().tolist() 
res = defaultdict(list)
for v, k in d: res[v].append(k)
d = dict(res)

d
# {'Developer': [29000, 28000], 'Tester': [27000, 26000]}

You can only rely on groupby if your data is already chunked into categories.

from itertools import groupby

with open("job.txt") as f:
    rows = [x.split(",") for x in f.readlines()[1:]]

data = {
    k.strip(): [int(y[1]) for y in v]
    for k, v in groupby(rows, key=lambda x: x[0])
}

With that in mind, I think a defaultdict is more appropriate here. Ordering is automatically handled and it's just less clever. Additionally, there's no need to slurp the file into memory or sort it (if unordered). Use dict(data) at the end if you don't like the defaultdict subclass.

from collections import defaultdict

data = defaultdict(list)

with open("job.txt") as f:
    for i, line in enumerate(f):
        if i:
            job, salary = [x.strip() for x in line.split(",")]
            data[job].append(int(salary))

As mentioned in the accepted answer, do prefer a CSV module if your actual data is at all more complicated than your example. CSVs can be difficult to parse and there's no reason to reinvent the wheel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM