从多个 Excel 文件中读取数据并将数据规范化为 Arrays

Question

假设我有 3 个不同的 Excel 文件（file1.xlsx、file2.xlsx、file3.xlsx），其中包含以下数据：

1.4
1.3
1.42
1.3

1.4
1.33
1.4
1.13

1.4
1.23
1.14
1.3

我想将标准化值写入 python 中的数组。有没有办法将 pd.read_xlsx('file1.xlsx') 与将数据标准化为 1 相结合？ 现在我正在读取每个文件，然后对每个文件进行规范化，最后将其写入一个数组。

data1 = pd.read_excel('file1.xlsx')
data2 = pd.read_excel('file2.xlsx')
data3 = pd.read_excel('file3.xlsx')

归一化后

x = np.array([data1,data2,data3])
x = x.reshape(x.shape[0], -1)

我相信有一种更简单的方法，只需从一个充满 excel 文件的文件夹中读取数据，其中指定的数据位于相同的单元格中。 有人知道怎么做吗？

Answer 1

您需要将每个文件处理为列表理解中的 DataFrame 列表，然后对于预期的 output 使用concat和转置：

import glob

data = [pd.read_excel(f) for for f in glob.glob(path)]

x = pd.concat(data, axis=1).T.to_numpy()

或者你的解决方案：

x = np.array(data)
x = x.reshape(x.shape[0], -1)

编辑：

#your solution for read data
data = [pd.read_excel(f, skiprows=14, usecols='C', nrows=30, engine='openpyxl') 
        for f in glob.glob('C:/Users/user/Desktop/JupyterNB/folder/*.xlsx')]

测试解决方案：

#sample data 
data = [pd.DataFrame([0,4,5,9]),
        pd.DataFrame([4,55,9,12]),
        pd.DataFrame([10,104,5,9])]

#normalize each file separately and then join by concat
L = [(x-np.min(x))/(np.max(x)-np.min(x)) for x in data]
out = pd.concat(L, axis=1).T.to_numpy()
print (out)
[[0.         0.44444444 0.55555556 1.        ]
 [0.         1.         0.09803922 0.15686275]
 [0.05050505 1.         0.         0.04040404]]

#first concat and then normalize
x = pd.concat(data, axis=1)
out = ((x-np.min(x))/(np.max(x)-np.min(x))).T.to_numpy()
print (out) 
[[0.         0.44444444 0.55555556 1.        ]
 [0.         1.         0.09803922 0.15686275]
 [0.05050505 1.         0.         0.04040404]]

从多个 Excel 文件中读取数据并将数据规范化为 Arrays

问题描述

1 个解决方案

解决方案1
0 2023-01-09 08:19:16

从多个 Excel 文件中读取数据并将数据规范化为 Arrays

问题描述

1 个解决方案

解决方案1 0 2023-01-09 08:19:16

解决方案1
0 2023-01-09 08:19:16