简体   繁体   English

将多个.txt 文件加载到 pandas dataframe 与列

[英]Loading multiple .txt files to pandas dataframe with columns

I am trying to read all the.txt files with the below format provided and concat them to a single pandas dataframe.我正在尝试使用提供的以下格式读取所有 the.txt 文件,并将它们连接到单个 pandas dataframe。

sample1.txt样本1.txt

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

sample2.txt样本2.txt

ID                                    b123
Delivery_person_ID             VADRES03DEL02
Delivery_person_Age                    22.00
Delivery_person_Ratings                 4.10
Name: 2, dtype: object

Below is the code -下面是代码 -

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None)
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None)
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

Output: Output:

                              0               1
0                            ID          a123
1            Delivery_person_ID  VADRES03DEL01
2           Delivery_person_Age       24.00
3       Delivery_person_Ratings        4.30
4       Name: 1, dtype: object             NaN
0                            ID          b123
1            Delivery_person_ID  VADRES03DEL02
2           Delivery_person_Age       22.00
3       Delivery_person_Ratings        4.10
4       Name: 2, dtype: object            NaN

But I need the dataframe to be listed row wise for each person.但我需要为每个人逐行列出 dataframe。 For eg, in below format I want -例如,我想要以下格式 -

                              ID          Delivery_person_ID  Delivery_person_Age       Delivery_person_Ratings       
                              0  a123                VADRES03DEL01      24.00              4.30                             

                              1  b123                VADRES03DEL02      22.00              4.10      

So, the input text file is weird - this code should deal with that所以,输入文本文件很奇怪——这段代码应该处理

# Read in text file
df = pd.read_fwf("./test.txt")
# Remove the "Name: 1, dtype: object"
df = df.drop(df.index[3])
# Transpose it
df = df.T
# Rename the columns correctly
df.columns = df.iloc[0]
# Remove the column names from the data
df = df.drop(df.index[0])

An input text file that looks like this:如下所示的输入文本文件:

ID                                    a123
Delivery_person_ID             VADRES03DEL01
Delivery_person_Age                    24.00
Delivery_person_Ratings                 4.30
Name: 1, dtype: object

Would be converted to this:将转换为:

ID   Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
a123      VADRES03DEL01               24.00                    4.30

From here, you can do the same for each text file, then do a pd.concat() to merge the new textfile dataframe to the main dataframe, but from your code I can see that you already know how to do this.从这里,您可以对每个文本文件执行相同的操作,然后执行 pd.concat() 将新的文本文件 dataframe 合并到主 dataframe,但从您的代码中我可以看到您已经知道如何执行此操作。

After reading text file to pandas dataframe make it transform for each one将文本文件读取到pandas dataframe后,对其进行transform

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None).T
  
for i in range(1,len(file_list)):    
    df = pd.read_fwf(file_list[i], header=None).T
    main_dataframe = pd.concat([main_dataframe, df], axis = 0)
  
print(main_dataframe.head(30))  

Edit编辑

import pandas as pd
import glob

folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")


def read_clean_df(file_name) -> pd.DataFrame:
    df = pd.read_fwf(file_name, header=None).T
    df.pop(4)
    df.columns = df.iloc[0]
    df = df[1:]
    df.reset_index(drop=True, inplace=True)
    return df


main_dataframe = read_clean_df(file_list[0])

for file_name in file_list[1:]:
    df = read_clean_df(file_list[0])
    main_dataframe = pd.concat([main_dataframe, df], axis=0)

main_dataframe.reset_index(drop=True, inplace=True)
print(main_dataframe.head(30))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM