[英]pandas - read_csv with missing values in headline
I have this kind of csv file: 我有这种csv文件:
date,a,b,c
2014,12,29,7,12,45
2014,12,30,7,13,12
2014,12,31,6.5,6,5
So the first row does not explicitly specify all columns, and kind of assumes that you understand that the date is the first 3 columns. 因此,第一行未明确指定所有列,因此,假设您了解日期是前三列。
How do I tell read_csv to consider the first three columns as one date column (while keeping the other labels)? 如何告诉read_csv将前三列视为一个日期列(同时保留其他标签)?
You can parse your columns directly as a date, if you use the parse_dates
argument. 如果使用parse_dates
参数,则可以直接将列解析为日期。
parse_dates : boolean, list of ints or names, list of lists, or dict, default False parse_dates:布尔值,整数或名称列表,列表或字典列表,默认为False
If True -> try parsing the index. 如果为True->尝试解析索引。 If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. 如果[1、2、3]->尝试将第1、2、3列分别解析为单独的日期列。 If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. 如果[[1,3]]->合并列1和3并解析为单个日期列。 {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' A fast-path exists for iso8601-formatted dates. {'foo':[1,3]}->将第1、3列解析为日期,并调用结果'foo'。存在iso8601格式日期的快速路径。
For your file, you can do something like this: 对于您的文件,您可以执行以下操作:
pd.read_csv(file_path, names=['y', 'm', 'd', 'a', 'b', 'c'], header=0,
parse_dates={'date': [0, 1, 2]}, index_col='date', )
a b c
date
2014-12-29 7.0 12 45
2014-12-30 7.0 13 12
2014-12-31 6.5 6 5
The thing with the missing values in headline is solved by passing the names
argument and header=0
(to overwrite the existing header). 标题中缺少值的东西可以通过传递names
参数和header=0
(覆盖现有标题)来解决。 Then it is possible to specify which columns should be parsed as a date. 然后可以指定将哪些列解析为日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.