I am working on a project in which I want to sort a list of lists with data points (clones) and ID's into a different list. To be clear, the desired format is clusters[id, data[]]. The data is a list with 8 datapoints. The format I now have the lists in is clusterData[clusterId, ...], and a list of id's and clusters such as clusterResultData[cloneId, clusterId].
The sorting process is done as follows:
for i in range(len(clusterResultData)):
clusterId = int(clusterResultData[i][1])
clusters[clusterId].append(clusterData[i])
The output of this is a list with format clusters[clusterId, data[cloneId, ...]. However, each of the 'clusters' is filled with ALL of the ~1000 data points, while the points should be divided over the clusters.
If it helps, here is the full code:
clusterResultData = []
clusterData = []
clusterIdList = []
with open("Voorbeeld_clusterresult.txt", "r") as resultFile:
i = 0
for line in resultFile: #doorloopt alle regels in het bestand
if len(line) != 0:
clusterResultData.append(line.split()) #maakt een list met [cloneId, clusterId] voor elke clone
clusterIdList.append(clusterResultData[i][1])
i += 1
amOfClusters = len(set(clusterIdList)) #aantal unieke clusterIds
clusters = amOfClusters * [['']]
with open("Voorbeeld_clusterdata.txt", "r") as resultFile:
i = 0
for line in resultFile:
if len(line) != 0:
clusterData.append(line.split()) #maakt een list met [cloneId, clusterId] voor elke clone
#print clusterData[i], clusterResultData[i]
i += 1
for i in range(len(clusterResultData)):
clusterId = int(clusterResultData[i][1])
clusters[clusterId].append(clusterData[i])
for i in range(amOfClusters):
print i, clusters[i][1] #test, every cluster is exactly identical
and these are the structures of the two txt files with data:
Voorbeeld_clusterdata.txt:
846160 0.388 0.329 0.69 0.9 0.626 0.621 0.399 0.37
820434 -0.296 -0.503 -0.454 -0.868 -0.721 -0.918 -0.486 -0.582
849103 -0.246 -0.935 -0.277 -0.175 -0.278 -0.075 -0.236 -0.417
...
Voorbeeld_clusterresult.txt:
846160 1
820434 5
849103 4
...
The problem is on the line where you generate clusters
:
clusters = amOfClusters * [['']]
This will create a list which has amOfClusters
references to the same sublist. When you add an item to a sublist in any index you see the change everywhere:
>>> clusters = [['']] * 4
>>> clusters
[[''], [''], [''], ['']]
>>> clusters[0].append('x')
>>> clusters
[['', 'x'], ['', 'x'], ['', 'x'], ['', 'x']]
In order to fix this you need to create a new list for every index. You can do so easily with list comprehension
:
>>> clusters = [[''] for _ in range(4)]
>>> clusters[0].append('x')
>>> clusters
[['', 'x'], [''], [''], ['']]
If you change the line your code to following you should get the expected behavior:
clusters = [[''] for _ in range(amOfClusters)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.