简体   繁体   中英

Python Pandas : How to skip columns when reading a file?

I have table formatted as follow :

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

When I open it with pandas doing :

a = pd.read_table("file", header=None, sep=" ")

It tells me :

CParserError: Error tokenizing data. C error: Expected 9 fields in line 2, saw 12

What I'd basically like to have is something similar to the skiprows option which would allow me to do something like :

a = pd.read_table("file", header=None, sep=" ", skipcolumns=[8:])

I'm aware that I could re-format this table with awk , but I'd like to known whether a Pandas solution exists or not.

Thanks.

The usecols parameter allows you to select which columns to use:

a = pd.read_table("file", header=None, sep=" ", usecols=range(8))

However, to accept irregular column counts you need to also use engine='python' .

If you are using Linux/OS X/Windows Cygwin, you should be able to prepare the file as follows:

cat your_file |  cut -d' ' -f1,2,3,4,5,6,7 > out.file

Then in Python:

a = pd.read_table("out.file", header=None, sep=" ")

Example:

Input:

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

Output:

foo - bar - 10 2e-5 0.0
quz - baz - 4 1e-2 1

You can run this command manually on the command-line, or simply call it from within Python using the subprocess module .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM