[英]Pandas Read CSV file with variable rows to skip with special character at the beginning of row
When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ? 使用pandas,read_csv方法读取CSV文件时,如果事先不知道行数,如何跳过这些行?
I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data. 我有一个CSV文件,其中包含文件开头的一些元数据,然后包含标题和实际数据。
Example for the file sample_file.csv : 文件sample_file.csv的示例:
# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i
How would I use Pandas read_csv function and skiprows parameter to read the csv ? 我如何使用Pandas read_csv函数和skiprows参数来读取csv?
df = pd.read_csv('sample_file.csv', skiprows=?)
Does Pandas 0.19.X or greater support this use case ? Pandas 0.19.X或更高版本是否支持此用例?
comment
is what you're searching for: comment
是你要搜索的:
df = pd.read_csv('sample_file.csv', comment='#')
From the documentation: 从文档:
comment : str, default None
comment:str,默认无
Indicates remainder of line should not be parsed.
表示不应解析行的剩余部分。 If found at the beginning of a line, the line will be ignored altogether.
如果在行的开头找到,则该行将被完全忽略。 This parameter must be a single character.
此参数必须是单个字符。 Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows.
与空行一样(只要skip_blank_lines = True),参数标题将忽略完全注释的行,但不会被跳过。 For example, if comment='#', parsing '#emptyna,b,cn1,2,3' with header=0 will result in 'a,b,c' being treated as the header.
例如,如果comment ='#',则使用header = 0解析'#emptyna,b,cn1,2,3'将导致'a,b,c'被视为标题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.