[英]Python read csv to Dataframe, stuggeling with date columns
Python read csv to Dataframe, stuggeling with date columns Python 将 csv 读取到 Dataframe,与日期列发生冲突
Hi all,大家好,
I have problems by reading in a csv which looks like:我在阅读 csv 时遇到问题,它看起来像:
col_A;col_B;col_C;Col_Date_1;Col_Date_2;Col_Date_3
57;-;60;03.02.2020;-;06.07.2020
126;8;-;03.02.2020;04.03.2020;06.07.2020
-;45;-;30.01.2020;29.02.2020;29.06.2020
106;83;189;-;29.02.2020;29.06.2020
-;12;84;30.01.2020;29.02.2020;-
|col_A|col_B|col_C|Col_Date_1 |Col_Date_2 |Col_Date_3|
----------------------------------------------------
|57 |- |60 |03.02.2020 |- |06.07.2020|
|126 |8 |- |03.02.2020 |04.03.2020 |06.07.2020|
|- |45 |- |30.01.2020 |29.02.2020 |29.06.2020|
|106 |83 |189 |- |29.02.2020 |29.06.2020|
|- |12 |84 |30.01.2020 |29.02.2020 |- |
Here is how I tried to read in the CSV.这是我尝试阅读 CSV 的方法。
import pandas as pd
df_puma = pd.read_csv(test.csv, sep=";",dayfirst=True, parse_dates=['Col_Date_1','Col_Date_2','Col_Date_3'], encoding='latin-1')
Unfortunately, both kinds of columns (the first 3 integers and the last 3 with dates) are not automatically in the right type.不幸的是,这两种列(前 3 个整数和最后 3 个带日期的列)都不是自动正确的类型。
df.info()
----------
col_A 404 non-null object
col_B 404 non-null object
col_C 404 non-null object
Col_Date_1 404 non-null object
Col_Date_2 404 non-null object
Col_Date_3 404 non-null object
Well, I hoped at least the date columns should be recognized as a kind of date, unfortunately not:(. Like:好吧,我希望至少日期列应该被识别为一种日期,不幸的是不是:(。像:
df.info()
----------
col_A 404 non-null int64
col_B 404 non-null int64
col_C 404 non-null int64
Col_Date_1 404 non-null datetime64[ns]
Col_Date_2 404 non-null datetime64[ns]
Col_Date_3 404 non-null datetime64[ns]
Could someone give me a hint, how to to get the data in the right type?有人可以给我一个提示,如何获取正确类型的数据? In my mind would it be like:
在我看来会是这样的:
col_A;col_B;col_C;Col_Date_1;Col_Date_2;Col_Date_3
57;NaN;60;03.02.2020;NaT;06.07.2020
126;8;NaN;03.02.2020;04.03.2020;06.07.2020
NaN;45;NaN;30.01.2020;29.02.2020;29.06.2020
106;83;189;NaT;29.02.2020;29.06.2020
NaN;12;84;30.01.2020;29.02.2020;NaT
|col_A|col_B|col_C|Col_Date_1 |Col_Date_2 |Col_Date_3|
----------------------------------------------------
|57 |NaN |60 |03.02.2020 |NaT |06.07.2020|
|126 |8 |NaN |03.02.2020 |04.03.2020 |06.07.2020|
|NaN |45 |NaN |30.01.2020 |29.02.2020 |29.06.2020|
|106 |83 |189 |NaT |29.02.2020 |29.06.2020|
|NaN |12 |84 |30.01.2020 |29.02.2020 |NaT |
Do I have to iterate through all the columns and rows and clean up the "-" entities?我是否必须遍历所有列和行并清理“-”实体? I still on a quiet newbie level in Python and don't know what is the best solution...
我在 Python 中仍然处于安静的新手级别,不知道什么是最好的解决方案......
Hope you guys can help me.希望你们能帮助我。
Replace your -
values with nan and then parse the dates用 nan 替换您的
-
值,然后解析日期
from io import StringIO
import pandas as pd
s = """col_A;col_B;col_C;Col_Date_1;Col_Date_2;Col_Date_3
57;-;60;03.02.2020;-;06.07.2020
126;8;-;03.02.2020;04.03.2020;06.07.2020
-;45;-;30.01.2020;29.02.2020;29.06.2020
106;83;189;-;29.02.2020;29.06.2020
-;12;84;30.01.2020;29.02.2020;-"""
df = pd.read_csv(StringIO(s), sep=';', na_values='-',
parse_dates=[3,4,5], dayfirst=True)
col_A col_B col_C Col_Date_1 Col_Date_2 Col_Date_3
0 57.0 NaN 60.0 2020-02-03 NaT 2020-07-06
1 126.0 8.0 NaN 2020-02-03 2020-03-04 2020-07-06
2 NaN 45.0 NaN 2020-01-30 2020-02-29 2020-06-29
3 106.0 83.0 189.0 NaT 2020-02-29 2020-06-29
4 NaN 12.0 84.0 2020-01-30 2020-02-29 NaT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.