数据导入（重塑，numpy，pandas）

Question

I have multiple directories with files inside (index), each directory has a state. 我有多个目录，里面有文件（索引），每个目录都有一个状态。 I want to loop over all files from a directory, create foreach a 2D histogram and bringing all together in one object with the ability to select rows based on the state. 我想循环遍历目录中的所有文件，创建一个二维直方图并将所有文件放在一个对象中，并能够根据状态选择行。

For example (with a 3x3 2D-Histogram): 例如（使用3x3 2D直方图）：

"Filename"  , "State", "X_1", "X_2", "X_3", "X_4", "X_5", "X_6", "X_7", "X_8","X_9"

"File_1.csv", "FOO",0,0,1,2,3,0,0,0,0
"File_2.csv", "FOO",0,0,1,2,3,1,1,0,0
"File_3.csv", "FOO",0,0,4,5,3,0,0,0,0
"File_4.csv", "BAr",0,0,1,2,3,0,0,0,0
"File_5.csv", "BAR",0,0,1,2,3,1,1,0,0
"File_6.csv", "BAR",0,0,4,5,3,0,0,0,0

I've done: 我弄完了：

def read(path, b, State):
        HistList = []
        HistName = []
        files = os.listdir(path)

        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          HistList.append( hist.flatten() )
          NameList.append(files[i])

    return DataFrame( ??? )

Answer 1

Why not using a dictionary? 为什么不使用字典？

You can create a Final_Dict{} that you pass it to the function as an argument and the function will complete that dictionary little by little for every folder and its files. 您可以创建一个Final_Dict{} ，将它作为参数传递给函数，该函数将逐个完成每个文件夹及其文件的字典。 In this dictionary main keys represent the folder ( Final_Dict[folder_name] ). 在此词典中，主键表示文件夹（ Final_Dict[folder_name] ）。 Then the sub-keys of that main key are for the file names of that particular folder ( Final_Dict[folder_name][file_name] ) and finally the value of that sub-key is the histogram. 然后该主键的子键用于该特定文件夹的文件名（ Final_Dict[folder_name][file_name] ），最后该子键的值是直方图。

Just to be clear, the following line extracts the folder name from the path: 为了清楚起见，以下行从路径中提取文件夹名称：

current_folder = os.path.basename(os.path.normpath(path))

Code (not tested): 代码（未测试）：

def read(Final_Dict, path, b, para):
        current_folder = os.path.basename(os.path.normpath(path))  
        Final_Dict[current_folder] = {}

        files = os.listdir(path)
        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          Final_Dict[current_folder][files[i]] = hist.flatten()

    return Final_Dict

Final_Dict = {}
b = ... 
para = ...
for folder_path in folder_path_list:
      Final_Dict = read(Final_Dict, folder_path, b, para)

After that you can convert the Final_Dict to the data frame: 之后，您可以将Final_Dict转换为数据框：

Final_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

quick example of the conversion: 快速转换示例：

import numpy as np
import pandas as pd

Final_Dict= {}
Final_Dict['state1'] = {}
Final_Dict['state2'] = {}

Final_Dict['state1']['file1'] = [1,2,3]
Final_Dict['state1']['file2'] = [9,9,9]
Final_Dict['state2']['file1'] = [3,3,3]
Final_Dict['state2']['file2'] = [7,6,5]

FInal_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

print "whole dataframe:"
print FInal_Dataframe

print "\n\n\nSelecting folder 2: "
print FInal_Dataframe.loc['state2']

result : 结果：

whole dataframe:
            file2      file1
state1  [9, 9, 9]  [1, 2, 3]
state2  [7, 6, 5]  [3, 3, 3]



Selecting folder 2: 
file2    [7, 6, 5]
file1    [3, 3, 3]
Name: state2, dtype: object

数据导入（重塑，numpy，pandas）

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-10-22 08:05:25

数据导入（重塑，numpy，pandas）

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-10-22 08:05:25

解决方案1
0 已采纳 2016-10-22 08:05:25