[英]Merge sorting a 2d array
I'm stuck again on trying to make this merge sort work. 我再次陷入尝试使这种合并排序工作。 Currently, I have a 2d array with a Unix timecode(fig 1) and merge sorting using (fig 2) I am trying to check the first value in each array ie array[x][0] and then move the whole array depending on array[x][0] value, however, the merge sort creates duplicates of data and deletes other data (fig 3) my question is what am I doing wrong?
目前,我有一个带有Unix时间码的2d数组(图1),并使用(图2)合并排序。我试图检查每个数组中的第一个值,即array [x] [0],然后根据array [x] [0]值,但是,合并排序会创建重复数据并删除其他数据(图3)。我的问题是我在做什么错? I know it's the merge sort but cant see the fix.
我知道这是合并排序,但看不到解决方法。
fig 1 图。1
[[1422403200 100]
[1462834800 150]
[1458000000 25]
[1540681200 150]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
fig 2 图2
import numpy as np
def sort(data):
if len(data) > 1:
Mid = len(data) // 2
l = data[:Mid]
r = data[Mid:]
sort(l)
sort(r)
z = 0
x = 0
c = 0
while z < len(l) and x < len(r):
if l[z][0] < r[x][0]:
data[c] = l[z]
z += 1
else:
data[c] = r[x]
x += 1
c += 1
while z < len(l):
data[c] = l[z]
z += 1
c += 1
while x < len(r):
data[c] = r[x]
x += 1
c += 1
print(data, 'done')
unixdate = [1422403200, 1462834800, 1458000000, 1540681200, 1498863600, 1540771200, 1540771200,1540771200, 1540771200, 1540771200]
price=[100, 150, 25, 150, 300, 100, 100, 100, 100, 100]
array = np.column_stack((unixdate, price))
sort(array)
print(array, 'sorted')
fig 3 图3
[[1422403200 100]
[1458000000 25]
[1458000000 25]
[1498863600 300]
[1498863600 300]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]
[1540771200 100]]
I couldn't spot any mistake in your code. 我无法在您的代码中发现任何错误。
I have tried your code and I can tell that the problem does not happen, at least with regular Python lists: The function doesn't change the number of occurrence of any element in the list. 我已经尝试过您的代码,并且至少在常规的Python列表中,我可以告诉您该问题不会发生:该函数不会更改列表中任何元素的出现次数。
data = [
[1422403200, 100],
[1462834800, 150],
[1458000000, 25],
[1540681200, 150],
[1498863600, 300],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
]
sort(data)
from pprint import pprint
pprint(data)
Output: 输出:
[[1422403200, 100],
[1458000000, 25],
[1462834800, 150],
[1498863600, 300],
[1540681200, 150],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100],
[1540771200, 100]]
Edit , taking into account the numpy context and the use of
np.column_stack
.
编辑时 ,要考虑到numpy上下文和
np.column_stack
的使用。
-I expect what happens there is that
np.column_stack
actually creates a
view mapping over the two arrays.
-我希望发生的事情是
np.column_stack
实际上在两个数组上创建了一个
视图映射。
To get a real array rather than a link to your existing arrays, you should
copy that array:-
要获得真实的数组而不是指向现有数组的链接,您应该
复制该数组:
array = np.column_stack((unixdate, price)).copy()
Edit 2 , taking into account the numpy context 编辑2 ,考虑到numpy上下文
This behavior has actually nothing to do with np.column_stack
; 这种行为实际上与
np.column_stack
; np.column_stack
already performs a copy. np.column_stack
已执行复制。
The reason your code doesn't work is because slicing behaves differently with numpy than with python. 您的代码不起作用的原因是因为numpy的切片行为与python不同。 Slicing create a view of the array which maps indexes.
切片创建映射索引的数组视图 。
The erroneous lines are: 错误的行是:
l = data[:Mid] r = data[Mid:]
Since l
and r
just map to two pieces of the memory held by data
, they are modified when data
is. 由于
l
和r
只是映射到data
保存的两个内存中,因此在data
为true时会对其进行修改。 This is why the lines data[c] = l[z]
and data[c] = r[x]
overwrite values and create copies when moving values. 这就是为什么
data[c] = l[z]
和data[c] = r[x]
覆盖值并在移动值时创建副本的原因。
If data
is a numpy array, we want l
and r
to be copies of data, not just views. 如果
data
是一个numpy数组,我们希望l
和r
是数据的副本,而不仅仅是视图。 This can be achieved using the copy
method. 这可以使用
copy
方法来实现。
l = data[:Mid] r = data[Mid:] if isinstance(data, np.ndarray): l = l.copy() r = r.copy()
This way, I tested, the copy works. 通过这种方式,我测试了复制的效果。
Note 注意
If you wanted to sort the data using python lists rather than numpy arrays, the equivalent of np.column_stack in vanilla python is zip
: 如果您想使用python列表而不是numpy数组对数据进行排序,那么在香草python中,np.column_stack的等效项是
zip
:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000]) z # <zip at 0x7f6ef80ce8c8> # `zip` creates an iterator, which is ready to give us our entries. # Iterators can only be walked once, which is not the case of lists. list(z) # [(10, 100, 1000), (20, 200, 2000), (30, 300, 3000), (40, 400, 4000)]
The entries are (non-mutable) tuples. 这些条目是(非可变的)元组。 If you need the entries to be editable, map list on them:
如果您需要条目可编辑,请在其上列出地图:
z = zip([10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000]) li = list(map(list, z)) # [[10, 100, 1000], [20, 200, 2000], [30, 300, 3000], [40, 400, 4000]]
To transpose a matrix, use zip(*matrix)
: 要转置矩阵,请使用
zip(*matrix)
:
def transpose(matrix): return list(map(list, zip(*matrix))) transpose(l) # [[10, 20, 30, 40], [100, 200, 300, 400], [1000, 2000, 3000, 4000]]
You can also sort a python list li
using li.sort()
, or sort any iterator (lists are iterators), using sorted(li)
. 您还可以使用
li.sort()
对python列表li
进行排序,或者使用sorted(li)
对任何迭代器(列表为迭代器)进行sorted(li)
。
Here, I would use (tested): 在这里,我将使用(经过测试):
sorted(zip(unixdate, price))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.