[英]Python/Pandas- how to read_csv and as the same time ignore rows that have #?
My files have two formats ...some have # lines in the begining and some dont. 我的文件有两种格式...有些开头有#行,有些则没有。 I want to read_csv the matrix above into pandas dataframe and want to ignore the rows with # before populating my dataframe.
我想将上面的矩阵read_csv转换为pandas数据框,并想在填充数据框之前忽略带有#的行。 My headers should be the ID SID and AID and so on.....so i think i can read a file by skipping the first 4 rows and i know how to do that.
我的标头应该是ID SID和AID,依此类推.....所以我认为我可以跳过前4行来读取文件,我知道该怎么做。 But the problem is there are files where the rows donot have first 4 # rows and directly start with ID SID AID....headers.
但是问题是有些文件的行没有前4#行,而直接以ID SID AID .... headers开头。
When i read in the data frame, i guess it assigns the col name as #PI 当我读数据框时,我猜它将col名称指定为#PI
Why not just read in all rows using read_csv and then filter out lines with # using .loc? 为什么不只使用read_csv读取所有行,然后使用.loc使用#过滤掉行?
Something like 就像是
df.loc[~df['col'].str.startswith('#')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.