简体   繁体   English

Pandas读取带有可变行的CSV文件,以便在行的开头跳过特殊字符

[英]Pandas Read CSV file with variable rows to skip with special character at the beginning of row

When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ? 使用pandas,read_csv方法读取CSV文件时,如果事先不知道行数,如何跳过这些行?

I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data. 我有一个CSV文件,其中包含文件开头的一些元数据,然后包含标题和实际数据。

  • The meta data always start with a # sign and it would always be at the top of CSV file. 元数据始终以符号开头,它始终位于CSV文件的顶部。
  • The number of lines for meta data is not fixed. 元数据的行数不固定。

Example for the file sample_file.csv : 文件sample_file.csv的示例:

# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i

How would I use Pandas read_csv function and skiprows parameter to read the csv ? 我如何使用Pandas read_csv函数和skiprows参数来读取csv?

df = pd.read_csv('sample_file.csv', skiprows=?)

Does Pandas 0.19.X or greater support this use case ? Pandas 0.19.X或更高版本是否支持此用例?

comment is what you're searching for: comment是你要搜索的:

df = pd.read_csv('sample_file.csv', comment='#')

From the documentation: 从文档:

comment : str, default None comment:str,默认无

Indicates remainder of line should not be parsed. 表示不应解析行的剩余部分。 If found at the beginning of a line, the line will be ignored altogether. 如果在行的开头找到,则该行将被完全忽略。 This parameter must be a single character. 此参数必须是单个字符。 Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. 与空行一样(只要skip_blank_lines = True),参数标题将忽略完全注释的行,但不会被跳过。 For example, if comment='#', parsing '#emptyna,b,cn1,2,3' with header=0 will result in 'a,b,c' being treated as the header. 例如,如果comment ='#',则使用header = 0解析'#emptyna,b,cn1,2,3'将导致'a,b,c'被视为标题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM