Python / Pandas-如何read_csv并同时忽略具有＃的行？

Question

My files have two formats ...some have # lines in the begining and some dont. 我的文件有两种格式...有些开头有＃行，有些则没有。 I want to read_csv the matrix above into pandas dataframe and want to ignore the rows with # before populating my dataframe. 我想将上面的矩阵read_csv转换为pandas数据框，并想在填充数据框之前忽略带有＃的行。 My headers should be the ID SID and AID and so on.....so i think i can read a file by skipping the first 4 rows and i know how to do that. 我的标头应该是ID SID和AID，依此类推.....所以我认为我可以跳过前4行来读取文件，我知道该怎么做。 But the problem is there are files where the rows donot have first 4 # rows and directly start with ID SID AID....headers. 但是问题是有些文件的行没有前4＃行，而直接以ID SID AID .... headers开头。

When i read in the data frame, i guess it assigns the col name as #PI 当我读数据框时，我猜它将col名称指定为#PI

Answer 1

The pandas read_csv function allows you to specify a comment character via comment='#' . 熊猫的read_csv函数允许您通过comment='#'指定注释字符。 This will ignore any lines that begin with #. 这将忽略任何以＃开头的行。

Answer 2

Why not just read in all rows using read_csv and then filter out lines with # using .loc? 为什么不只使用read_csv读取所有行，然后使用.loc使用＃过滤掉行？

Something like 就像是

df.loc[~df['col'].str.startswith('#')]

Python / Pandas-如何read_csv并同时忽略具有＃的行？

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-02-05 18:04:36

解决方案2
0 2019-02-04 21:53:14

Python / Pandas-如何read_csv并同时忽略具有＃的行？

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-02-05 18:04:36

解决方案2 0 2019-02-04 21:53:14

解决方案1
3 已采纳 2019-02-05 18:04:36

解决方案2
0 2019-02-04 21:53:14