[英]Numpy - create a summary df from array
I have a 2d array of 20,10, values ranging from 0 to 12 (created from a dataframe).我有一个 20,10 的二维数组,值范围从 0 到 12(从数据帧创建)。
arr = np.random.choice(np.arange(0, 13), size=(20,10))
array([[0, 9, 9, 7, 6, 2, 6, 4, 4, 3],
[0, 2, 1, 7, 1, 0, 2, 6, 6, 2],
[7, 3, 9, 8, 9, 7, 1, 10, 4, 2],
[0, 7, 0, 1, 4, 5, 8, 4, 2, 2],
[5, 2, 12, 3, 12, 2, 7, 12, 4, 12],
[0, 11, 0, 10, 7, 4, 12, 11, 11, 4],
[0, 9, 9, 8, 5, 11, 7, 6, 10, 7],
[0, 9, 0, 10, 11, 1, 5, 10, 8, 10],
[3, 11, 4, 7, 7, 8, 10, 11, 5, 12],
[0, 5, 0, 8, 1, 5, 1, 11, 9, 1],
[0, 8, 6, 12, 11, 1, 4, 11, 4, 1],
[2, 10, 5, 5, 7, 9, 11, 6, 12, 10],
[9, 8, 11, 4, 10, 1, 10, 12, 0, 3],
[0, 7, 10, 8, 2, 10, 5, 7, 9, 6],
[0, 9, 6, 9, 1, 12, 4, 1, 8, 2],
[8, 12, 10, 12, 8, 2, 3, 0, 11, 4],
[6, 7, 11, 12, 8, 7, 1, 9, 9, 8],
[0, 4, 0, 8, 9, 7, 1, 1, 3, 5],
[0, 8, 1, 11, 2, 12, 6, 11, 12, 10],
[0, 7, 3, 8, 3, 3, 7, 1, 9, 9]])
Desired output is a dataframe with rows and columns going from 0 to 12. And the cell values should be the count of number of consecutive times a value changes from one value to another in all rows of the array.所需的 output 是一个 dataframe,行和列从 0 到 12。单元格值应该是数组所有行中值从一个值更改为另一个值的连续次数的计数。
0 1 2 3 4 5 6 7 8 9 10 11 12
0 25 20 30
1
2
3
4 2 2 5 4
5
6
7
8
9
10
11
12
(Not true output) (不是真正的输出)
For example, in this array, 0 to 9 change occurs 4 times.例如,在这个数组中,0 到 9 的变化出现了 4 次。 And 10 to 12 change occurs 2 times: 10 到 12 的变化发生 2 次:
If you use a Counter from collections library you can solve it like this如果您使用 collections 库中的计数器,您可以这样解决
import numpy as np
from collections import Counter
max_number = 12
np.random.choice(np.arange(0, max_number+1), size=(20,10))
index = np.array(list((i, i+1) for i in range(array.size-1)))
counter = Counter(map(tuple, tuple(array.reshape(-1)[index].tolist())))
result = np.zeros(shape=(max_number,max_number))
for i in range(max_number):
for j in range(max_number):
result[i,j] = counter[(i,j)]
result
This is my solution.这是我的解决方案。 Can it be improved?可以改进吗?
max_ = arr.max()
shape_ = np.arange(arr.min(), arr.max() + 1)
df = pd.DataFrame(index=shape_, columns=shape_)
df.fillna(0, inplace=True)
for row in arr:
for i in range(len(row) - 1):
df[row[i]][row[i + 1]] += 1
df.T
>>
0 1 2 3 4 5 6 7 8 9 10 11 12
0 0 1 2 1 1 1 0 3 4 4 2 2 0
1 1 1 0 1 2 2 0 1 1 2 2 2 1
2 0 1 1 1 0 0 2 1 0 0 2 0 2
3 1 0 0 1 0 1 0 1 1 1 0 1 1
4 1 2 2 1 1 1 0 1 0 0 1 1 2
5 1 1 1 0 0 1 0 2 1 0 1 1 1
6 0 0 2 0 1 0 1 1 0 1 1 1 2
7 1 5 0 2 1 0 2 1 1 2 1 1 1
8 0 2 3 1 1 1 1 1 0 2 2 1 1
9 1 2 0 0 0 0 2 3 4 4 0 1 0
10 0 1 0 0 1 2 0 2 2 0 0 2 2
11 1 2 1 0 5 1 1 1 0 1 0 1 2
12 1 0 1 1 2 0 1 0 2 0 3 2 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.