![](/img/trans.png)
[英]Remove items from a list in Python based on previous items in the same list
[英]Group items from a list based on on certain items from that list
我列出了2個元素:公司ID和組號。 我想以此方式根據不同列表中的組號對這些公司進行分組,以便我可以對每個單獨的組進行一些回歸分析。 我的清單:
59872004 0
74202004 0
1491772004 1
1476392004 1
309452004 1
1171452004 1
150842004 2
143592004 2
76202004 2
119232004 2
80492004 2
291732004 2
我當前的代碼如下:
list_of_variables = []
with open(str(csv_path) + "2004-297-100.csv", 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
list_of_variables.append(row)
del list_of_variables[0]
list_of_lists = []
counter = 0
counter_list = 0
one_cluster = []
variable = []
for line in list_of_variables:
print('counter: ', counter_list)
# for testing purposes
if counter_list == 20:
break
# print("cluster: ", cluster)
# append the first line from the list to the intermediary list
if counter_list == 0:
one_cluster.append(line)
if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
print("one cluster : ", one_cluster)
variable = one_cluster[counter-1]
# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
# if the grouped number changed put the list into the final list
# clear the intermediary list and append the current element which was not part of the previous group
if line[1] != variable[1]:
list_of_lists.append(one_cluster.copy())
# print("here", list_of_lists)
one_cluster.clear()
one_cluster.append(line)
counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1
print(list_of_lists)
該代碼的輸出如下:
[[['59872004','0'],['74202004','0']],[['1491772004','1'],['309452004','1'],['1171452004',' 1']],[['150842004','2'],['76202004','2'],['119232004','2'],['80492004','2'],['291732004' ,'2']]]
代碼的預期輸出:
[[['59872004','0'],['74202004','0']],[['1491772004','1'],['1476392004','1'],['309452004',' 1'],['1171452004','1']],[['150842004','2'],['143592004','2'],['76202004','2'],['119232004' ,'2'],['80492004','2'],['291732004','2']]]
如果仔細觀察,零組工作正確,但是其他所有組的公司都缺失。 例如,組1應該具有4個元素,但是我的代碼僅輸出3個元素,以此類推。 我環顧四周,但沒有找到可以使此操作更輕松的方法。 如果您知道如何解決此問題或為我指明正確的方向,我將不勝感激。
感謝您的時間和耐心!
更新 :我已經將列表從圖片更改為可以復制的內容。 並增加了預期的輸出。
您過於復雜了您的代碼。 如果您的目標是根據csv文件的第二列將所有這些公司分組,則只需在讀取文件后添加以下代碼即可:
from collections import defaultdict
grouping = defaultdict(list)
for line in list_of_variables:
grouping[line[1]].append(line[0])
現在,如果您要使用一組元素,那么假設第1組只是遍歷它:
for company in grouping[1]:
我找到了解決我問題的方法。 如果我剪線
變量= one_cluster [counter-1]並放在前面
if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
在for循環中獲取以下代碼:
for line in list_of_variables:
print('counter: ', counter_list)
if counter_list == 50:
break
# print("cluster: ", cluster)
if counter_list == 0:
one_cluster.append(line)
variable = one_cluster[counter - 1]
if counter_list >= 1:
if line[1] == variable[1]:
one_cluster.append(line)
print("one cluster : ", one_cluster)
# print('line : ', line[1])
# print('variable : ', variable[1])
counter += 1
if line[1] != variable[1]:
list_of_lists.append(one_cluster.copy())
# print("here", list_of_lists)
one_cluster.clear()
one_cluster.append(line)
counter = 0
# print('variable', variable)
# print('one_cluster ', one_cluster)
counter_list += 1
然后一切都會按預期進行。 我已經為此苦苦掙扎了很長時間,然后這個主意才浮現出來……但是,如果有人有更簡單的方法可以做到這一點,我歡迎您提出建議。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.