[英]Python Pandas : How to skip columns when reading a file?
I have table formatted as follow : 我的表格格式如下:
foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here
When I open it with pandas doing : 当我用熊猫打开它时:
a = pd.read_table("file", header=None, sep=" ")
It tells me : 它告诉我:
CParserError: Error tokenizing data. C error: Expected 9 fields in line 2, saw 12
What I'd basically like to have is something similar to the skiprows option which would allow me to do something like : 我基本上喜欢的是类似于skiprows选项,它可以让我做类似的事情:
a = pd.read_table("file", header=None, sep=" ", skipcolumns=[8:])
I'm aware that I could re-format this table with awk
, but I'd like to known whether a Pandas solution exists or not. 我知道我可以用
awk
重新格式化这个表,但是我想知道是否存在Pandas解决方案。
Thanks. 谢谢。
The usecols
parameter allows you to select which columns to use: usecols
参数允许您选择要使用的列:
a = pd.read_table("file", header=None, sep=" ", usecols=range(8))
However, to accept irregular column counts you need to also use engine='python'
. 但是,要接受不规则的列数,您还需要使用
engine='python'
。
If you are using Linux/OS X/Windows Cygwin, you should be able to prepare the file as follows: 如果您使用的是Linux / OS X / Windows Cygwin,则应该能够按如下方式准备文件:
cat your_file | cut -d' ' -f1,2,3,4,5,6,7 > out.file
Then in Python: 然后在Python中:
a = pd.read_table("out.file", header=None, sep=" ")
Example: 例:
Input: 输入:
foo - bar - 10 2e-5 0.0 some information
quz - baz - 4 1e-2 1 some other description in here
Output: 输出:
foo - bar - 10 2e-5 0.0
quz - baz - 4 1e-2 1
You can run this command manually on the command-line, or simply call it from within Python using the subprocess
module . 您可以在命令行上手动运行此命令,或者只需使用
subprocess
模块从Python中调用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.