繁体   English   中英

从 pandas 数据帧中提取数据到一个新的数据帧

[英]Extracting data from pandas data frame to a new data frame

我的代码中有一个数据结构,它是一个字典字典。 嵌套字典将所有键作为 pandas 数据帧。 基本上,我有多个带有多个选项卡和列的 excel 文件,所以我创建了这个数据结构,因为我想进一步对这些数据进行一些建模。 现在,我想从每个 excel 文件(如果它们存在于该文件中)的一个特定选项卡中提取两列,并将它们打印在新的主数据框中。 我尝试了一些例程,但无法获得预期的结果。 请在下面找到我尝试解决此问题的代码。

def text_extraction_to_dataframe(dict1, process_key):
    '''This routine is used to extract any required column from the data into a new dataframe with the file name as new
    column attached to it'''        
    
    #Initializing new data frame
    df = pd.DataFrame()
    df['ExcelFile'] = ''
    
    #Running nested for-loops to get into our data structure(dictionary of dictionaries)
    for key, value in dict1.items():
                    
        for key1, value1 in value.items():

            #Checking if the required tab matches to the key
            if key1 == process_key:
                    
                df = pd.DataFrame(value1)   #Extracting all the data from the tab to the new dataframe

                df['ExcelFile'] = key.split('.')[0]  #Appending the data frame with new column as the filename
        
    #Removing unnecessary columns from the data frame and only keeping column3 and column4
    df = df.drop(columns = ['colum_1', 'column2']) 
    return df

text_extraction_to_dataframe(dictionary, 'tab_name')

此例程不会从每个 excel 文件的所有列中提取所有数据。

另外,我想获取主数据框的最后一列作为 excel 文件名。

基本上,master df 的结构将是 [column3, column4, excelfilename]

让我知道您是否需要除此之外的任何其他内容。 任何帮助,将不胜感激。

我通过将所有数据框添加到列表中然后将它们连接起来解决了这个查询。 请在代码下方找到。

def text_extraction_to_dataframe(dictionary, process_key):
    '''This routine is used to extract any required column from the data into a 
     new data frame with the file name as a new column attached to it'''        
    
    #List to append all the read data frames
    master_df1 = []
    length = len(dictionary)
    
    #Running nested for-loops to get into our data structure(dictionary of dictionaries)
    for key, value in dictionary.items():
            
        for key1, value1 in value.items():
                
            #Checking if the required tab matches to the key
            if key1 == process_key:
                df = pd.DataFrame(value1)

                #Adding the excel file name as the last column in each data frame
                df['ExcelFile'] = key.split('.')[0]
                
                #Appending all data frames in the list
                master_df1.append(df)
    
    #Concatenating all the data frames in the master data frame            
    master_df1 = pd.DataFrame(pd.concat(master_df1, ignore_index=True))
    
    #Dropping unnecessary column
    master_df1 = master_df1.drop(columns=['column1', 'column2'])
    
    return master_df1 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM