[英]Data import (reshape, numpy, pandas)
I have multiple directories with files inside (index), each directory has a state. 我有多个目录,里面有文件(索引),每个目录都有一个状态。 I want to loop over all files from a directory, create foreach a 2D histogram and bringing all together in one object with the ability to select rows based on the state.
我想循环遍历目录中的所有文件,创建一个二维直方图并将所有文件放在一个对象中,并能够根据状态选择行。
For example (with a 3x3 2D-Histogram): 例如(使用3x3 2D直方图):
"Filename" , "State", "X_1", "X_2", "X_3", "X_4", "X_5", "X_6", "X_7", "X_8","X_9"
"File_1.csv", "FOO",0,0,1,2,3,0,0,0,0
"File_2.csv", "FOO",0,0,1,2,3,1,1,0,0
"File_3.csv", "FOO",0,0,4,5,3,0,0,0,0
"File_4.csv", "BAr",0,0,1,2,3,0,0,0,0
"File_5.csv", "BAR",0,0,1,2,3,1,1,0,0
"File_6.csv", "BAR",0,0,4,5,3,0,0,0,0
I've done: 我弄完了:
def read(path, b, State):
HistList = []
HistName = []
files = os.listdir(path)
for i in range(0, len(files)):
....
hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
HistList.append( hist.flatten() )
NameList.append(files[i])
return DataFrame( ??? )
Why not using a dictionary? 为什么不使用字典?
You can create a Final_Dict{}
that you pass it to the function as an argument and the function will complete that dictionary little by little for every folder and its files. 您可以创建一个
Final_Dict{}
,将它作为参数传递给函数,该函数将逐个完成每个文件夹及其文件的字典。 In this dictionary main keys represent the folder ( Final_Dict[folder_name]
). 在此词典中,主键表示文件夹(
Final_Dict[folder_name]
)。 Then the sub-keys of that main key are for the file names of that particular folder ( Final_Dict[folder_name][file_name]
) and finally the value of that sub-key is the histogram. 然后该主键的子键用于该特定文件夹的文件名(
Final_Dict[folder_name][file_name]
),最后该子键的值是直方图。
Just to be clear, the following line extracts the folder name from the path: 为了清楚起见,以下行从路径中提取文件夹名称:
current_folder = os.path.basename(os.path.normpath(path))
Code (not tested): 代码(未测试):
def read(Final_Dict, path, b, para):
current_folder = os.path.basename(os.path.normpath(path))
Final_Dict[current_folder] = {}
files = os.listdir(path)
for i in range(0, len(files)):
....
hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
Final_Dict[current_folder][files[i]] = hist.flatten()
return Final_Dict
Final_Dict = {}
b = ...
para = ...
for folder_path in folder_path_list:
Final_Dict = read(Final_Dict, folder_path, b, para)
After that you can convert the Final_Dict to the data frame: 之后,您可以将Final_Dict转换为数据框:
Final_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)
quick example of the conversion: 快速转换示例:
import numpy as np
import pandas as pd
Final_Dict= {}
Final_Dict['state1'] = {}
Final_Dict['state2'] = {}
Final_Dict['state1']['file1'] = [1,2,3]
Final_Dict['state1']['file2'] = [9,9,9]
Final_Dict['state2']['file1'] = [3,3,3]
Final_Dict['state2']['file2'] = [7,6,5]
FInal_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)
print "whole dataframe:"
print FInal_Dataframe
print "\n\n\nSelecting folder 2: "
print FInal_Dataframe.loc['state2']
result : 结果:
whole dataframe:
file2 file1
state1 [9, 9, 9] [1, 2, 3]
state2 [7, 6, 5] [3, 3, 3]
Selecting folder 2:
file2 [7, 6, 5]
file1 [3, 3, 3]
Name: state2, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.