简体   繁体   English

将 3D 数组重塑为 2D 数组以生成 DataFrame:跟踪索引以生成列名

[英]Reshaping a 3D array to a 2D array to produce a DataFrame: keep track of indices to produce column names

The following code generates a pandas.DataFrame from a 3D array over the first axis.以下代码在第一个轴上从 3D 数组生成pandas.DataFrame I manually create the columns names (defining cols ): is there a more built-in way to do this (to avoid potential errors eg regarding C-order)?我手动创建列名(定义cols ):有没有更内置的方法来做到这一点(以避免潜在的错误,例如关于 C 顺序)?

--> I am looking for a way to guarantee the respect of the order of the indices after the reshape operation (here it relies on the correct order of the iterations over range(nrow) and range(ncol) ). --> 我正在寻找一种方法来保证在reshape操作之后尊重索引的顺序(这里它依赖于range(nrow)range(ncol) (ncol) 上迭代的正确顺序)。

import numpy as np
import pandas as pd

nt = 6 ; nrow = 4 ; ncol = 3 ; shp = (nt, nrow, ncol)

np.random.seed(0)
a = np.array(np.random.randint(0, 1000, nt*nrow*ncol)).reshape(shp)

# This is the line I think should be improved --> any numpy function or so?
cols = [str(i) + '-' + str(j) for i in range(nrow) for j in range(ncol)]

adf = pd.DataFrame(a.reshape(nt, -1), columns = cols)

print(adf)

   0-0  0-1  0-2  1-0  1-1  1-2  2-0  2-1  2-2  3-0  3-1  3-2
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

EDIT编辑

Illustrating why I don't like my solution - it is just too easy to make a code which technically works but produce a wrong result (inverting i and j or nrow and ncol ):说明为什么我不喜欢我的解决方案 - 制作一个在技术上有效但产生错误结果的代码太容易了(反转ijnrowncol ):

wrongcols1 = [str(i) + '-' + str(j) for i in range(ncol) for j in range(nrow)]
adf2 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols1)
print(adf2)
   0-0  0-1  0-2  0-3  1-0  1-1  1-2  1-3  2-0  2-1  2-2  2-3
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

wrongcols2 = [str(j) + '-' + str(i) for i in range(nrow) for j in range(ncol)]
adf3 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols2)
print(adf3)
   0-0  1-0  2-0  0-1  1-1  2-1  0-2  1-2  2-2  0-3  1-3  2-3
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

Try this and see if it fits your use case:试试这个,看看它是否适合你的用例:

Generate columns via a combination of np.indices , np.dstack and np.vstack :通过np.indicesnp.dstacknp.vstack的组合生成列:

columns = np.vstack(np.dstack(np.indices((nrow, ncol))))

array([[0, 0],
       [0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2],
       [2, 0],
       [2, 1],
       [2, 2],
       [3, 0],
       [3, 1],
       [3, 2]])

Now convert to string via a combination of map , join and list comprehension :现在通过mapjoinlist comprehension的组合转换为字符串:

columns = ["-".join(map(str, entry)) for entry in columns]
['0-0',
 '0-1',
 '0-2',
 '1-0',
 '1-1',
 '1-2',
 '2-0',
 '2-1',
 '2-2',
 '3-0',
 '3-1',
 '3-2']

Let's know how it goes.让我们知道它是怎么回事。

You could try to use pd.MultiIndex to construct your hierarchy.您可以尝试使用pd.MultiIndex来构建您的层次结构。

First redefine your cols to a list of tuples :首先将您的cols重新定义为tuples list

cols = [(i, j) for i in range(nrow) for j in range(ncol)]

Then construct the multi index with cols :然后用cols构造多索引:

multi_cols = pd.MultiIndex.from_tuples(cols)

And build the dataframe:并构建 dataframe:

adf = pd.DataFrame(a.reshape(nt, -1), columns=multi_cols)

Result:结果:

              0           1           2           3
      0   1   2   0   1   2   0   1   2   0   1   2
0   684 559 629 192 835 763 707 359   9 723 277 754
1   804 599  70 472 600 396 314 705 486 551  87 174
2   600 849 677 537 845  72 777 916 115 976 755 709
3   847 431 448 850  99 984 177 755 797 659 147 910
4   423 288 961 265 697 639 544 543 714 244 151 675
5   510 459 882 183  28 802 128 128 932  53 901 550

Access of elements:元素的访问:

print(adf[1][2][0])
>>> 763

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM