Python Pandas：如何在讀取文件時跳過列？

Question

我的表格格式如下：

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

當我用熊貓打開它時：

a = pd.read_table("file", header=None, sep=" ")

它告訴我：

CParserError: Error tokenizing data. C error: Expected 9 fields in line 2, saw 12

我基本上喜歡的是類似於skiprows選項，它可以讓我做類似的事情：

a = pd.read_table("file", header=None, sep=" ", skipcolumns=[8:])

我知道我可以用awk重新格式化這個表，但是我想知道是否存在Pandas解決方案。

謝謝。

Answer 1

usecols參數允許您選擇要使用的列：

a = pd.read_table("file", header=None, sep=" ", usecols=range(8))

但是，要接受不規則的列數，您還需要使用engine='python' 。

Answer 2

如果您使用的是Linux / OS X / Windows Cygwin，則應該能夠按如下方式准備文件：

cat your_file |  cut -d' ' -f1,2,3,4,5,6,7 > out.file

然后在Python中：

a = pd.read_table("out.file", header=None, sep=" ")

例：

輸入：

foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here

輸出：

foo - bar - 10 2e-5 0.0
quz - baz - 4 1e-2 1

您可以在命令行上手動運行此命令，或者只需使用subprocess模塊從Python中調用它。