简体   繁体   中英

File names (from multiple files) as a column names in one data frame

I have many text files with one column data,different dtype (float64, date), no header inside.
I'm trying to write code which will:
- get all file names without extension -> create a list (this works!)
- read all files in one directory and concat them into one data frame with one numerated index.

My code:

filelist = os.listdir(path)                             #Make a file list
file_names=[os.path.splitext(x)[0] for x in filelist]   #Remove file extension

Tried this (first option):

df_list = [pd.read_table(file) for file in filelist]
df = pd.concat(df_list,ignore_index=True)

...but I got 3 columns from 6 files with completely messed data.

Also tried this (second option):

df=pd.DataFrame(columns=file_names)

for file in filelist:
    frame=pd.read_csv(file)
    df=df.append(frame, ignore_index=True)

...this also doesn't work.

Any advice would be appreciated.

Input
At the beginning of Q*.txt files are only zeros (about 100values), and after this numbers shows.

Q1.txt   Q2.txt   T21     T22
  0       0      51.06    77.46
  0       0      50.32    77.33
  0       0      50.90    77.45

When I run "first option", I got:

 filelist
 >>>['Q1.txt', 'Q2.txt','T21.txt', 'T22.txt']     
 file_names
 >>>['Q1', 'Q2','T21', 'T22']
 df.dtypes
 >>>0        object
 >>>51.06    object
 >>>77.46    object
 >>>dtype: object

Output file

    0  51.06 77.46
 0  0       
 1  0       
 2  0       

It looks like first 2 files (those with zeros at the beginning) are in one column. Second and third are first values of file T21 and T22.

Thanks to @Viktor Kerkez I've added header=None to the pd.read_table and now all files are in one column, dtype=object.
How can I split all files to many columns ?

You can do the next thing:

import os
import pandas as pd

file_names = []
data_frames = []
for filename in os.listdir(path):
    name = os.path.splitext(filename)[0]
    file_names.append(name)
    df = pd.read_csv(filename, header=None)
    df.rename(columns={0: name}, inplace=True)
    data_frames.append(df)

combined = pd.concat(data_frames, axis=1)

Here I renamed every DataFrame column to match the file name, you can leave that step out, and just use ignore_index=True .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM