Pandas读取带有可变行的CSV文件，以便在行的开头跳过特殊字符

Question

When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ? 使用pandas，read_csv方法读取CSV文件时，如果事先不知道行数，如何跳过这些行？

I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data. 我有一个CSV文件，其中包含文件开头的一些元数据，然后包含标题和实际数据。

The meta data always start with a # sign and it would always be at the top of CSV file. 元数据始终以＃符号开头，它始终位于CSV文件的顶部。
The number of lines for meta data is not fixed. 元数据的行数不固定。

Example for the file sample_file.csv : 文件sample_file.csv的示例：

# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i

How would I use Pandas read_csv function and skiprows parameter to read the csv ? 我如何使用Pandas read_csv函数和skiprows参数来读取csv？

df = pd.read_csv('sample_file.csv', skiprows=?)

Does Pandas 0.19.X or greater support this use case ? Pandas 0.19.X或更高版本是否支持此用例？

Answer 1

comment is what you're searching for: comment是你要搜索的：

df = pd.read_csv('sample_file.csv', comment='#')

From the documentation: 从文档：

comment : str, default None comment：str，默认无

Indicates remainder of line should not be parsed. 表示不应解析行的剩余部分。 If found at the beginning of a line, the line will be ignored altogether. 如果在行的开头找到，则该行将被完全忽略。 This parameter must be a single character. 此参数必须是单个字符。 Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. 与空行一样（只要skip_blank_lines = True），参数标题将忽略完全注释的行，但不会被跳过。 For example, if comment='#', parsing '#emptyna,b,cn1,2,3' with header=0 will result in 'a,b,c' being treated as the header. 例如，如果comment ='＃'，则使用header = 0解析'＃emptyna，b，cn1,2,3'将导致'a，b，c'被视为标题。

Pandas读取带有可变行的CSV文件，以便在行的开头跳过特殊字符

问题描述

1 个解决方案

解决方案1
5 已采纳 2017-01-30 22:03:54

Pandas读取带有可变行的CSV文件，以便在行的开头跳过特殊字符

问题描述

1 个解决方案

解决方案1 5 已采纳 2017-01-30 22:03:54

解决方案1
5 已采纳 2017-01-30 22:03:54