[英]Efficiently Compare / Consolidate Python Lists
i have a list like 我有一个清单
list[0][0]="CatA"
list[0][1]="SubCatA"
list[0][2]="3,4"
list[1][0]="CatB"
list[1][1]="SubCatA"
list[1][2]="1,2"
list[2][0]="CatA"
list[2][1]="SubCatA"
list[2][2]="5,9"
list[3][0]="CatA"
list[3][1]="SubCatB"
list[3][2]="4,7"
Concat Field list[x][2] if list[x][1] equal and list[x][2] equal So Result have to be like Concat字段list [x] [2]如果list [x] [1]相等且list [x] [2]相等,那么结果必须像
list[0][0]="CatA"
list[0][1]="SubCatA"
list[0][2]="3,4,5,9"
list[1][0]="CatB"
list[1][1]="SubCatA"
list[1][2]="1,2"
list[3][0]="CatA"
list[3][1]="SubCatB"
list[3][2]="4,7"
my code looks like 我的代码看起来像
for y in range(len(arr)):
print(y)
print(arr[y])
for z in range(len(arr)):
print("{}.{}".format(y,z))
if (y!=z) and (arr[y][0]!=-1) and (arr[y][0]==arr[z][0]) and (arr[y][1]==arr[z][1]):
arr[y][2]="{},{}".format(arr[y][2],arr[z][2])
#arr.pop(z) //first approach but error because cannot delete while iterating
arr[z][0]=-1
print(arr)
res= []
for y in range(len(arr)):
if (arr[y][0]==-1):
print("nothing");
else:
res.append(arr[y])
print(res)
Problem: This is very unefficient on large arr[]. 问题:在大的arr []上这效率很低。 i have arr lists length like >2000 so i need run 2*2000*2000 loop bodys.
我的arr列表长度大于2000,因此我需要运行2 * 2000 * 2000循环体。
Anyone have a better Approach to do the job? 任何人都有更好的方法来完成这项工作?
Use a dict
or dict
like for efficient lookup: 使用
dict
或dict
类的dict
进行有效查找:
>>> import collections
>>>
>>> result = []
>>>
>>> def extend_result():
... result.append([*record[:2], []])
... return result[-1][2]
...
>>> uniquizer = collections.defaultdict(extend_result)
>>>
>>> for record in arr:
... uniquizer[tuple(record[:2])].append(record[2])
...
>>> for record in result:
... record[2] = ','.join(record[2])
...
>>> result
[['CatA', 'SubCatA', '3,4,5,9'], ['CatB', 'SubCatA', '1,2'], ['CatA', 'SubCatB', '4,7']]
You can try the manual approach with just one loop : 您可以通过一个循环尝试手动方法:
con_list={}
data_=[['CatA', 'SubCatA', '3,4'], ['CatB', 'SubCatA', '1,2'], ['CatA', 'SubCatA', '5,9'], ['CatA', 'SubCatB', '4,7']]
for i in data_:
if (i[0],i[1]) not in con_list:
con_list[(i[0],i[1])]=i
else:
con_list[(i[0],i[1])]=[i[0],i[1]]+["".join([con_list[(i[0],i[1])][-1]]+[',']+[i[-1]])]
print(list(con_list.values()))
output: 输出:
[['CatA', 'SubCatB', '4,7'], ['CatA', 'SubCatA', '3,4,5,9'], ['CatB', 'SubCatA', '1,2']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.