简体   繁体   English

数据导入(重塑,numpy,pandas)

[英]Data import (reshape, numpy, pandas)

I have multiple directories with files inside (index), each directory has a state. 我有多个目录,里面有文件(索引),每个目录都有一个状态。 I want to loop over all files from a directory, create foreach a 2D histogram and bringing all together in one object with the ability to select rows based on the state. 我想循环遍历目录中的所有文件,创建一个二维直方图并将所有文件放在一个对象中,并能够根据状态选择行。

For example (with a 3x3 2D-Histogram): 例如(使用3x3 2D直方图):

"Filename"  , "State", "X_1", "X_2", "X_3", "X_4", "X_5", "X_6", "X_7", "X_8","X_9"

"File_1.csv", "FOO",0,0,1,2,3,0,0,0,0
"File_2.csv", "FOO",0,0,1,2,3,1,1,0,0
"File_3.csv", "FOO",0,0,4,5,3,0,0,0,0
"File_4.csv", "BAr",0,0,1,2,3,0,0,0,0
"File_5.csv", "BAR",0,0,1,2,3,1,1,0,0
"File_6.csv", "BAR",0,0,4,5,3,0,0,0,0

I've done: 我弄完了:

def read(path, b, State):
        HistList = []
        HistName = []
        files = os.listdir(path)

        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          HistList.append( hist.flatten() )
          NameList.append(files[i])

    return DataFrame( ??? )

Why not using a dictionary? 为什么不使用字典?

You can create a Final_Dict{} that you pass it to the function as an argument and the function will complete that dictionary little by little for every folder and its files. 您可以创建一个Final_Dict{} ,将它作为参数传递给函数,该函数将逐个完成每个文件夹及其文件的字典。 In this dictionary main keys represent the folder ( Final_Dict[folder_name] ). 在此词典中,主键表示文件夹( Final_Dict[folder_name] )。 Then the sub-keys of that main key are for the file names of that particular folder ( Final_Dict[folder_name][file_name] ) and finally the value of that sub-key is the histogram. 然后该主键的子键用于该特定文件夹的文件名( Final_Dict[folder_name][file_name] ),最后该子键的值是直方图。

Just to be clear, the following line extracts the folder name from the path: 为了清楚起见,以下行从路径中提取文件夹名称:

current_folder = os.path.basename(os.path.normpath(path)) 

Code (not tested): 代码(未测试):

def read(Final_Dict, path, b, para):
        current_folder = os.path.basename(os.path.normpath(path))  
        Final_Dict[current_folder] = {}

        files = os.listdir(path)
        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          Final_Dict[current_folder][files[i]] = hist.flatten()

    return Final_Dict

Final_Dict = {}
b = ... 
para = ...
for folder_path in folder_path_list:
      Final_Dict = read(Final_Dict, folder_path, b, para)

After that you can convert the Final_Dict to the data frame: 之后,您可以将Final_Dict转换为数据框:

Final_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

quick example of the conversion: 快速转换示例:

import numpy as np
import pandas as pd

Final_Dict= {}
Final_Dict['state1'] = {}
Final_Dict['state2'] = {}

Final_Dict['state1']['file1'] = [1,2,3]
Final_Dict['state1']['file2'] = [9,9,9]
Final_Dict['state2']['file1'] = [3,3,3]
Final_Dict['state2']['file2'] = [7,6,5]

FInal_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

print "whole dataframe:"
print FInal_Dataframe

print "\n\n\nSelecting folder 2: "
print FInal_Dataframe.loc['state2']

result : 结果:

whole dataframe:
            file2      file1
state1  [9, 9, 9]  [1, 2, 3]
state2  [7, 6, 5]  [3, 3, 3]



Selecting folder 2: 
file2    [7, 6, 5]
file1    [3, 3, 3]
Name: state2, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM