简体   繁体   中英

Data import (reshape, numpy, pandas)

I have multiple directories with files inside (index), each directory has a state. I want to loop over all files from a directory, create foreach a 2D histogram and bringing all together in one object with the ability to select rows based on the state.

For example (with a 3x3 2D-Histogram):

"Filename"  , "State", "X_1", "X_2", "X_3", "X_4", "X_5", "X_6", "X_7", "X_8","X_9"

"File_1.csv", "FOO",0,0,1,2,3,0,0,0,0
"File_2.csv", "FOO",0,0,1,2,3,1,1,0,0
"File_3.csv", "FOO",0,0,4,5,3,0,0,0,0
"File_4.csv", "BAr",0,0,1,2,3,0,0,0,0
"File_5.csv", "BAR",0,0,1,2,3,1,1,0,0
"File_6.csv", "BAR",0,0,4,5,3,0,0,0,0

I've done:

def read(path, b, State):
        HistList = []
        HistName = []
        files = os.listdir(path)

        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          HistList.append( hist.flatten() )
          NameList.append(files[i])

    return DataFrame( ??? )

Why not using a dictionary?

You can create a Final_Dict{} that you pass it to the function as an argument and the function will complete that dictionary little by little for every folder and its files. In this dictionary main keys represent the folder ( Final_Dict[folder_name] ). Then the sub-keys of that main key are for the file names of that particular folder ( Final_Dict[folder_name][file_name] ) and finally the value of that sub-key is the histogram.

Just to be clear, the following line extracts the folder name from the path:

current_folder = os.path.basename(os.path.normpath(path)) 

Code (not tested):

def read(Final_Dict, path, b, para):
        current_folder = os.path.basename(os.path.normpath(path))  
        Final_Dict[current_folder] = {}

        files = os.listdir(path)
        for i in range(0, len(files)):
          ....
          hist,xe,ye = np.histogram2d( X, Y, bins=b, normed=True)
          Final_Dict[current_folder][files[i]] = hist.flatten()

    return Final_Dict

Final_Dict = {}
b = ... 
para = ...
for folder_path in folder_path_list:
      Final_Dict = read(Final_Dict, folder_path, b, para)

After that you can convert the Final_Dict to the data frame:

Final_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

quick example of the conversion:

import numpy as np
import pandas as pd

Final_Dict= {}
Final_Dict['state1'] = {}
Final_Dict['state2'] = {}

Final_Dict['state1']['file1'] = [1,2,3]
Final_Dict['state1']['file2'] = [9,9,9]
Final_Dict['state2']['file1'] = [3,3,3]
Final_Dict['state2']['file2'] = [7,6,5]

FInal_Dataframe = pd.DataFrame.from_dict(Final_Dict, orient='index', dtype=None)

print "whole dataframe:"
print FInal_Dataframe

print "\n\n\nSelecting folder 2: "
print FInal_Dataframe.loc['state2']

result :

whole dataframe:
            file2      file1
state1  [9, 9, 9]  [1, 2, 3]
state2  [7, 6, 5]  [3, 3, 3]



Selecting folder 2: 
file2    [7, 6, 5]
file1    [3, 3, 3]
Name: state2, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM