简体   繁体   中英

Separate columns in .txt file in Pandas

The raw.txt file looks like this

e1 47 3 Self-emp-inc Married-civ-spouse Transport-moving White Male Cuba                                                                                                                                                                                  
e2 52 16 Self-emp-not-inc Married-civ-spouse Prof-specialty White Male United-States                                                                                                                                                                      
e3 26 9 Private Divorced Craft-repair White Male United-States                                                                                                                                                                                            
e4 60 9 Private Married-civ-spouse Craft-repair White Male United-States 

I have tried

adult = pd.read_csv("Adult/dataset_full.txt", header=None)

It only gives get ONE column. If used sep=' ' it gives

<Error tokenizing data. C error: Expected 187 fields in line 3, saw 197>

Have tried skiprows=, read_fwf() , read_table() gives all similar result.

Does anyone have any insights on how to separate this file into columns?

If your file.txt is this:

e1 47 3 Self-emp-inc Married-civ-spouse Transport-moving White Male Cuba
e2 52 16 Self-emp-not-inc Married-civ-spouse Prof-specialty White Male United-States
e3 26 9 Private Divorced Craft-repair White Male United-States
e4 60 9 Private Married-civ-spouse Craft-repair White Male United-States

Then you have four rows with 9 values separated by a space. So you can:

  • read the file line by line
  • strip and split the line on space
  • pass this to a pandas DataFrame
  • (optional) create headers for columns
  • finally dump this to a .csv file

For example:

import pandas as pd

with open("file.txt") as f:
    df = pd.DataFrame([line.strip().split() for line in f.readlines()])
    headers = [f"Col{i}" for i in range(1, 10)]
    df.to_csv("your_table.csv", index=False, header=headers)

Output:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM