简体   繁体   English

根据列表中的某些项目对列表中的项目进行分组

[英]Group items from a list based on on certain items from that list

I have a list of 2 elements: company id and group number. 我列出了2个元素:公司ID和组号。 I want to make it in such a way to group those companies based on the group number in different lists so that I can run some regressions on each separate group. 我想以此方式根据不同列表中的组号对这些公司进行分组,以便我可以对每个单独的组进行一些回归分析。 The list I have: 我的清单:

59872004    0
74202004    0
1491772004  1
1476392004  1
309452004   1
1171452004  1
150842004   2
143592004   2
76202004    2
119232004   2
80492004    2
291732004   2

My current code is the following: 我当前的代码如下:

list_of_variables = []
with open(str(csv_path) + "2004-297-100.csv", 'r') as csvFile:
    reader = csv.reader(csvFile)
    for row in reader:
        list_of_variables.append(row)
    del list_of_variables[0]

list_of_lists = []
counter = 0
counter_list = 0
one_cluster = []
variable = []
for line in list_of_variables:
    print('counter: ', counter_list)
    # for testing purposes
    if counter_list == 20:
        break
    # print("cluster: ", cluster)
    # append the first line from the list to the intermediary list
    if counter_list == 0:
        one_cluster.append(line)
    if counter_list >= 1:
        if line[1] == variable[1]:
            one_cluster.append(line)
    print("one cluster : ", one_cluster)
    variable = one_cluster[counter-1]
    # print('line : ', line[1])
    # print('variable : ', variable[1])
    counter += 1
    # if the grouped number changed put the list into the final list
    # clear the intermediary list and append the current element which was not part of the previous group
    if line[1] != variable[1]:
        list_of_lists.append(one_cluster.copy())
        # print("here", list_of_lists)
        one_cluster.clear()
        one_cluster.append(line)
        counter = 0
    # print('variable', variable)
    # print('one_cluster ', one_cluster)
    counter_list += 1


print(list_of_lists)

The output from that code is the following: 该代码的输出如下:

[[['59872004', '0'], ['74202004', '0']], [['1491772004', '1'], ['309452004', '1'], ['1171452004', '1']], [['150842004', '2'], ['76202004', '2'], ['119232004', '2'], ['80492004', '2'], ['291732004', '2']]] [[['59872004','0'],['74202004','0']],[['1491772004','1'],['309452004','1'],['1171452004',' 1']],[['150842004','2'],['76202004','2'],['119232004','2'],['80492004','2'],['291732004' ,'2']]]

Expected output from the code: 代码的预期输出:

[[['59872004', '0'], ['74202004', '0']], [['1491772004', '1'], ['1476392004', '1'], ['309452004', '1'], ['1171452004', '1']], [['150842004', '2'], ['143592004', '2'], ['76202004', '2'], ['119232004', '2'], ['80492004', '2'], ['291732004', '2']]] [[['59872004','0'],['74202004','0']],[['1491772004','1'],['1476392004','1'],['309452004',' 1'],['1171452004','1']],[['150842004','2'],['143592004','2'],['76202004','2'],['119232004' ,'2'],['80492004','2'],['291732004','2']]]

If you look closely, group zero is doing it correctly, but all the other groups have missing companies. 如果仔细观察,零组工作正确,但是其他所有组的公司都缺失。 For example, group 1 is supposed to have 4 elements, but my code only outputs 3 elements and so on with the other lists. 例如,组1应该具有4个元素,但是我的代码仅输出3个元素,以此类推。 I have looked around but did not find something that could do this easier. 我环顾四周,但没有找到可以使此操作更轻松的方法。 If you know how to fix this or point me in the right direction I would be very grateful. 如果您知道如何解决此问题或为我指明正确的方向,我将不胜感激。

Thank you for your time and patience! 感谢您的时间和耐心!

UPDATE : I've changed the list from a picture to something that can be copied. 更新 :我已经将列表从图片更改为可以复制的内容。 And added an expected output. 并增加了预期的输出。

You're overly complicating your code. 您过于复杂了您的代码。 If you goal is to group all those companies based on the second column from the csv file, just add, after reading the file, the following code: 如果您的目标是根据csv文件的第二列将所有这些公司分组,则只需在读取文件后添加以下代码即可:

from collections import defaultdict

grouping = defaultdict(list)

for line in list_of_variables:
    grouping[line[1]].append(line[0])

Now, if you want to use a group of elements, let's say group 1, just run through it: 现在,如果您要使用一组元素,那么假设第1组只是遍历它:

for company in grouping[1]:

I found an answer to my problem. 我找到了解决我问题的方法。 If I cut the line 如果我剪线

variable = one_cluster[counter-1] and put it before 变量= one_cluster [counter-1]并放在前面

if counter_list >= 1:
        if line[1] == variable[1]:
            one_cluster.append(line)

to get the following code inside the for loop: 在for循环中获取以下代码:

for line in list_of_variables:
print('counter: ', counter_list)
if counter_list == 50:
    break
# print("cluster: ", cluster)
if counter_list == 0:
    one_cluster.append(line)
variable = one_cluster[counter - 1]
if counter_list >= 1:
    if line[1] == variable[1]:
        one_cluster.append(line)
print("one cluster : ", one_cluster)

# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
if line[1] != variable[1]:
    list_of_lists.append(one_cluster.copy())
    # print("here", list_of_lists)
    one_cluster.clear()
    one_cluster.append(line)
    counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1

Then everything works as expected. 然后一切都会按预期进行。 I was struggling with this for a quite some time and then the idea just came to me... However, if anyone has an easier way to do this I am open to suggestions. 我已经为此苦苦挣扎了很长时间,然后这个主意才浮现出来……但是,如果有人有更简单的方法可以做到这一点,我欢迎您提出建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM