Group items from a list based on on certain items from that list

Question

I have a list of 2 elements: company id and group number. I want to make it in such a way to group those companies based on the group number in different lists so that I can run some regressions on each separate group. The list I have:

59872004    0
74202004    0
1491772004  1
1476392004  1
309452004   1
1171452004  1
150842004   2
143592004   2
76202004    2
119232004   2
80492004    2
291732004   2

My current code is the following:

list_of_variables = []
with open(str(csv_path) + "2004-297-100.csv", 'r') as csvFile:
    reader = csv.reader(csvFile)
    for row in reader:
        list_of_variables.append(row)
    del list_of_variables[0]

list_of_lists = []
counter = 0
counter_list = 0
one_cluster = []
variable = []
for line in list_of_variables:
    print('counter: ', counter_list)
    # for testing purposes
    if counter_list == 20:
        break
    # print("cluster: ", cluster)
    # append the first line from the list to the intermediary list
    if counter_list == 0:
        one_cluster.append(line)
    if counter_list >= 1:
        if line[1] == variable[1]:
            one_cluster.append(line)
    print("one cluster : ", one_cluster)
    variable = one_cluster[counter-1]
    # print('line : ', line[1])
    # print('variable : ', variable[1])
    counter += 1
    # if the grouped number changed put the list into the final list
    # clear the intermediary list and append the current element which was not part of the previous group
    if line[1] != variable[1]:
        list_of_lists.append(one_cluster.copy())
        # print("here", list_of_lists)
        one_cluster.clear()
        one_cluster.append(line)
        counter = 0
    # print('variable', variable)
    # print('one_cluster ', one_cluster)
    counter_list += 1


print(list_of_lists)

The output from that code is the following:

[[['59872004', '0'], ['74202004', '0']], [['1491772004', '1'], ['309452004', '1'], ['1171452004', '1']], [['150842004', '2'], ['76202004', '2'], ['119232004', '2'], ['80492004', '2'], ['291732004', '2']]]

Expected output from the code:

[[['59872004', '0'], ['74202004', '0']], [['1491772004', '1'], ['1476392004', '1'], ['309452004', '1'], ['1171452004', '1']], [['150842004', '2'], ['143592004', '2'], ['76202004', '2'], ['119232004', '2'], ['80492004', '2'], ['291732004', '2']]]

If you look closely, group zero is doing it correctly, but all the other groups have missing companies. For example, group 1 is supposed to have 4 elements, but my code only outputs 3 elements and so on with the other lists. I have looked around but did not find something that could do this easier. If you know how to fix this or point me in the right direction I would be very grateful.

Thank you for your time and patience!

UPDATE : I've changed the list from a picture to something that can be copied. And added an expected output.

Answer 1

You're overly complicating your code. If you goal is to group all those companies based on the second column from the csv file, just add, after reading the file, the following code:

from collections import defaultdict

grouping = defaultdict(list)

for line in list_of_variables:
    grouping[line[1]].append(line[0])

Now, if you want to use a group of elements, let's say group 1, just run through it:

for company in grouping[1]:

Answer 2

I found an answer to my problem. If I cut the line

variable = one_cluster[counter-1] and put it before

if counter_list >= 1:
        if line[1] == variable[1]:
            one_cluster.append(line)

to get the following code inside the for loop:

for line in list_of_variables:
print('counter: ', counter_list)
if counter_list == 50:
    break
# print("cluster: ", cluster)
if counter_list == 0:
    one_cluster.append(line)
variable = one_cluster[counter - 1]
if counter_list >= 1:
    if line[1] == variable[1]:
        one_cluster.append(line)
print("one cluster : ", one_cluster)

# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
if line[1] != variable[1]:
    list_of_lists.append(one_cluster.copy())
    # print("here", list_of_lists)
    one_cluster.clear()
    one_cluster.append(line)
    counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1

Then everything works as expected. I was struggling with this for a quite some time and then the idea just came to me... However, if anyone has an easier way to do this I am open to suggestions.

Group items from a list based on on certain items from that list

Question

2 answers

solution1
1 ACCPTED 2018-11-24 14:00:49

solution2
0 2018-11-24 14:00:23

Group items from a list based on on certain items from that list

Question

2 answers

solution1 1 ACCPTED 2018-11-24 14:00:49

solution2 0 2018-11-24 14:00:23

solution1
1 ACCPTED 2018-11-24 14:00:49

solution2
0 2018-11-24 14:00:23