[英]Loading multiple .txt files to pandas dataframe with columns
I am trying to read all the.txt files with the below format provided and concat them to a single pandas dataframe.我正在尝试使用提供的以下格式读取所有 the.txt 文件,并将它们连接到单个 pandas dataframe。
ID a123
Delivery_person_ID VADRES03DEL01
Delivery_person_Age 24.00
Delivery_person_Ratings 4.30
Name: 1, dtype: object
ID b123
Delivery_person_ID VADRES03DEL02
Delivery_person_Age 22.00
Delivery_person_Ratings 4.10
Name: 2, dtype: object
Below is the code -下面是代码 -
folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None)
for i in range(1,len(file_list)):
df = pd.read_fwf(file_list[i], header=None)
main_dataframe = pd.concat([main_dataframe, df], axis = 0)
print(main_dataframe.head(30))
Output: Output:
0 1
0 ID a123
1 Delivery_person_ID VADRES03DEL01
2 Delivery_person_Age 24.00
3 Delivery_person_Ratings 4.30
4 Name: 1, dtype: object NaN
0 ID b123
1 Delivery_person_ID VADRES03DEL02
2 Delivery_person_Age 22.00
3 Delivery_person_Ratings 4.10
4 Name: 2, dtype: object NaN
But I need the dataframe to be listed row wise for each person.但我需要为每个人逐行列出 dataframe。 For eg, in below format I want -例如,我想要以下格式 -
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
0 a123 VADRES03DEL01 24.00 4.30
1 b123 VADRES03DEL02 22.00 4.10
So, the input text file is weird - this code should deal with that所以,输入文本文件很奇怪——这段代码应该处理
# Read in text file
df = pd.read_fwf("./test.txt")
# Remove the "Name: 1, dtype: object"
df = df.drop(df.index[3])
# Transpose it
df = df.T
# Rename the columns correctly
df.columns = df.iloc[0]
# Remove the column names from the data
df = df.drop(df.index[0])
An input text file that looks like this:如下所示的输入文本文件:
ID a123
Delivery_person_ID VADRES03DEL01
Delivery_person_Age 24.00
Delivery_person_Ratings 4.30
Name: 1, dtype: object
Would be converted to this:将转换为:
ID Delivery_person_ID Delivery_person_Age Delivery_person_Ratings
a123 VADRES03DEL01 24.00 4.30
From here, you can do the same for each text file, then do a pd.concat() to merge the new textfile dataframe to the main dataframe, but from your code I can see that you already know how to do this.从这里,您可以对每个文本文件执行相同的操作,然后执行 pd.concat() 将新的文本文件 dataframe 合并到主 dataframe,但从您的代码中我可以看到您已经知道如何执行此操作。
After reading text file to pandas dataframe
make it transform
for each one将文本文件读取到pandas dataframe
后,对其进行transform
folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.read_fwf(file_list[0], header=None).T
for i in range(1,len(file_list)):
df = pd.read_fwf(file_list[i], header=None).T
main_dataframe = pd.concat([main_dataframe, df], axis = 0)
print(main_dataframe.head(30))
Edit编辑
import pandas as pd
import glob
folder_path = '/drive/My Drive/dataset/train'
file_list = glob.glob(folder_path + "/*.txt")
def read_clean_df(file_name) -> pd.DataFrame:
df = pd.read_fwf(file_name, header=None).T
df.pop(4)
df.columns = df.iloc[0]
df = df[1:]
df.reset_index(drop=True, inplace=True)
return df
main_dataframe = read_clean_df(file_list[0])
for file_name in file_list[1:]:
df = read_clean_df(file_list[0])
main_dataframe = pd.concat([main_dataframe, df], axis=0)
main_dataframe.reset_index(drop=True, inplace=True)
print(main_dataframe.head(30))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.