Separate columns in .txt file in Pandas

Question

The raw.txt file looks like this

e1 47 3 Self-emp-inc Married-civ-spouse Transport-moving White Male Cuba                                                                                                                                                                                  
e2 52 16 Self-emp-not-inc Married-civ-spouse Prof-specialty White Male United-States                                                                                                                                                                      
e3 26 9 Private Divorced Craft-repair White Male United-States                                                                                                                                                                                            
e4 60 9 Private Married-civ-spouse Craft-repair White Male United-States

I have tried

adult = pd.read_csv("Adult/dataset_full.txt", header=None)

It only gives get ONE column. If used sep=' ' it gives

<Error tokenizing data. C error: Expected 187 fields in line 3, saw 197>

Have tried skiprows=, read_fwf() , read_table() gives all similar result.

Does anyone have any insights on how to separate this file into columns?

Answer 1

If your file.txt is this:

e1 47 3 Self-emp-inc Married-civ-spouse Transport-moving White Male Cuba
e2 52 16 Self-emp-not-inc Married-civ-spouse Prof-specialty White Male United-States
e3 26 9 Private Divorced Craft-repair White Male United-States
e4 60 9 Private Married-civ-spouse Craft-repair White Male United-States

Then you have four rows with 9 values separated by a space. So you can:

read the file line by line
strip and split the line on space
pass this to a pandas DataFrame
(optional) create headers for columns
finally dump this to a .csv file

For example:

import pandas as pd

with open("file.txt") as f:
    df = pd.DataFrame([line.strip().split() for line in f.readlines()])
    headers = [f"Col{i}" for i in range(1, 10)]
    df.to_csv("your_table.csv", index=False, header=headers)

Output:

Separate columns in .txt file in Pandas

Question

1 answers

solution1
0 2021-02-27 17:48:42

Separate columns in .txt file in Pandas

Question

1 answers

solution1 0 2021-02-27 17:48:42

solution1
0 2021-02-27 17:48:42