Python Pandas：如何在读取文件时跳过列？

Question

I have table formatted as follow : 我的表格格式如下：

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

When I open it with pandas doing : 当我用熊猫打开它时：

a = pd.read_table("file", header=None, sep=" ")

It tells me : 它告诉我：

CParserError: Error tokenizing data. C error: Expected 9 fields in line 2, saw 12

What I'd basically like to have is something similar to the skiprows option which would allow me to do something like : 我基本上喜欢的是类似于skiprows选项，它可以让我做类似的事情：

a = pd.read_table("file", header=None, sep=" ", skipcolumns=[8:])

I'm aware that I could re-format this table with awk , but I'd like to known whether a Pandas solution exists or not. 我知道我可以用awk重新格式化这个表，但是我想知道是否存在Pandas解决方案。

Thanks. 谢谢。

Answer 1

The usecols parameter allows you to select which columns to use: usecols参数允许您选择要使用的列：

a = pd.read_table("file", header=None, sep=" ", usecols=range(8))

However, to accept irregular column counts you need to also use engine='python' . 但是，要接受不规则的列数，您还需要使用engine='python' 。

Answer 2

If you are using Linux/OS X/Windows Cygwin, you should be able to prepare the file as follows: 如果您使用的是Linux / OS X / Windows Cygwin，则应该能够按如下方式准备文件：

cat your_file |  cut -d' ' -f1,2,3,4,5,6,7 > out.file

Then in Python: 然后在Python中：

a = pd.read_table("out.file", header=None, sep=" ")

Example: 例：

Input: 输入：

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

Output: 输出：

foo - bar - 10 2e-5 0.0
quz - baz - 4 1e-2 1

You can run this command manually on the command-line, or simply call it from within Python using the subprocess module . 您可以在命令行上手动运行此命令，或者只需使用subprocess模块从Python中调用它。

Python Pandas：如何在读取文件时跳过列？

问题描述

2 个解决方案

解决方案1
15 2014-06-23 13:02:44

解决方案2
-2 2014-06-23 13:30:14

Python Pandas：如何在读取文件时跳过列？

问题描述

2 个解决方案

解决方案1 15 2014-06-23 13:02:44

解决方案2 -2 2014-06-23 13:30:14

解决方案1
15 2014-06-23 13:02:44

解决方案2
-2 2014-06-23 13:30:14