简体   繁体   English

Python / Pandas-如何read_csv并同时忽略具有#的行?

[英]Python/Pandas- how to read_csv and as the same time ignore rows that have #?

在此处输入图片说明

My files have two formats ...some have # lines in the begining and some dont. 我的文件有两种格式...有些开头有#行,有些则没有。 I want to read_csv the matrix above into pandas dataframe and want to ignore the rows with # before populating my dataframe. 我想将上面的矩阵read_csv转换为pandas数据框,并想在填充数据框之前忽略带有#的行。 My headers should be the ID SID and AID and so on.....so i think i can read a file by skipping the first 4 rows and i know how to do that. 我的标头应该是ID SID和AID,依此类推.....所以我认为我可以跳过前4行来读取文件,我知道该怎么做。 But the problem is there are files where the rows donot have first 4 # rows and directly start with ID SID AID....headers. 但是问题是有些文件的行没有前4#行,而直接以ID SID AID .... headers开头。

When i read in the data frame, i guess it assigns the col name as #PI 当我读数据框时,我猜它将col名称指定为#PI

The pandas read_csv function allows you to specify a comment character via comment='#' . 熊猫的read_csv函数允许您通过comment='#'指定注释字符。 This will ignore any lines that begin with #. 这将忽略任何以#开头的行。

Why not just read in all rows using read_csv and then filter out lines with # using .loc? 为什么不只使用read_csv读取所有行,然后使用.loc使用#过滤掉行?

Something like 就像是

df.loc[~df['col'].str.startswith('#')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM