简体   繁体   中英

Sorting a list into another list

I am working on a project in which I want to sort a list of lists with data points (clones) and ID's into a different list. To be clear, the desired format is clusters[id, data[]]. The data is a list with 8 datapoints. The format I now have the lists in is clusterData[clusterId, ...], and a list of id's and clusters such as clusterResultData[cloneId, clusterId].

The sorting process is done as follows:

for i in range(len(clusterResultData)):
    clusterId = int(clusterResultData[i][1])
    clusters[clusterId].append(clusterData[i])

The output of this is a list with format clusters[clusterId, data[cloneId, ...]. However, each of the 'clusters' is filled with ALL of the ~1000 data points, while the points should be divided over the clusters.

If it helps, here is the full code:

clusterResultData = []
clusterData = []
clusterIdList = []


with open("Voorbeeld_clusterresult.txt", "r") as resultFile:
    i = 0
    for line in resultFile: #doorloopt alle regels in het bestand
        if len(line) != 0:
            clusterResultData.append(line.split()) #maakt een list met [cloneId, clusterId] voor elke clone
            clusterIdList.append(clusterResultData[i][1])
        i += 1
    amOfClusters = len(set(clusterIdList)) #aantal unieke clusterIds
    clusters = amOfClusters * [['']]
with open("Voorbeeld_clusterdata.txt", "r") as resultFile:
    i = 0
    for line in resultFile:
        if len(line) != 0:
            clusterData.append(line.split()) #maakt een list met [cloneId, clusterId] voor elke clone 
            #print clusterData[i], clusterResultData[i]
        i += 1
for i in range(len(clusterResultData)):
    clusterId = int(clusterResultData[i][1])
    clusters[clusterId].append(clusterData[i])

for i in range(amOfClusters):
    print i, clusters[i][1] #test, every cluster is exactly identical

and these are the structures of the two txt files with data:

Voorbeeld_clusterdata.txt:

846160  0.388  0.329  0.69  0.9  0.626  0.621  0.399  0.37
820434  -0.296  -0.503  -0.454  -0.868  -0.721  -0.918  -0.486  -0.582
849103  -0.246  -0.935  -0.277  -0.175  -0.278  -0.075  -0.236  -0.417
...

Voorbeeld_clusterresult.txt:

846160   1
820434   5
849103   4
...

The problem is on the line where you generate clusters :

clusters = amOfClusters * [['']]

This will create a list which has amOfClusters references to the same sublist. When you add an item to a sublist in any index you see the change everywhere:

>>> clusters = [['']] * 4
>>> clusters
[[''], [''], [''], ['']]
>>> clusters[0].append('x')
>>> clusters
[['', 'x'], ['', 'x'], ['', 'x'], ['', 'x']]

In order to fix this you need to create a new list for every index. You can do so easily with list comprehension :

>>> clusters = [[''] for _ in range(4)]
>>> clusters[0].append('x')
>>> clusters
[['', 'x'], [''], [''], ['']]

If you change the line your code to following you should get the expected behavior:

clusters = [[''] for _ in range(amOfClusters)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM