简体   繁体   English

如何使用条件将csv拆分为2个数据帧

[英]How to split csv into 2 dataframe with the condition

My idea is seperate both of the "String" then convert both dataframe into same datetime format.我的想法是将两个“字符串”分开,然后将两个数据帧转换为相同的日期时间格式。 I try the code我试试代码

data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d')

but there are some error on the output.但是输出有一些错误。 The 13/02/2020 will become 2020-02-13 that is what i want. 13/02/2020 将变成 2020-02-13 这就是我想要的。 But the 12/02/2020 will become 2020-12-02.但是 12/02/2020 将变成 2020-12-02。

My dataframe have 2 type of date format.我的数据框有 2 种日期格式。 Which is YYYY-MM-DD and DD/MM/YYYY.即 YYYY-MM-DD 和 DD/MM/YYYY。

dataframe数据框

I need to split it into 2 dataframe, all the row that have the date YYYY-MM-DD into the df1 .我需要它分成 2 个数据帧,所有日期为 YYYY-MM-DD的行都放入df1

The data type is object.数据类型是对象。

All all the row that have the date DD/MM/YYYY into the df2 .将日期为DD/MM/YYYY 的所有行都输入df2

Anyone know how to code it?有人知道怎么编码吗?

If dont need convert to datetimes use Series.str.contains with boolean indexing :如果不需要转换为日期时间,请使用带有boolean indexing Series.str.contains

mask = df['date'].str.contains('-')
df1 = df[mask].copy()
df2 = df[~mask].copy()

If need datetimes you can use parameter errors='coerce' in to_datetime for missing values if not matching format, so last remove missing values:如果需要日期to_datetime ,如果格式不匹配,您可以在to_datetime使用参数errors='coerce'来获取缺失值,因此最后删除缺失值:

df1 = (df.assign(date = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
        .dropna(subset=['date']))

df2 = (df.assign(date = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
         .dropna(subset=['date']))

EDIT: If need output column filled by correct datetimes you can replace missing values by another Series by Series.fillna :编辑:如果需要用正确的日期时间填充输出列,您可以通过Series.fillna用另一个Series替换缺失值:

date1 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')

df['date'] = date1.fillna(date2)

you can use the fact that the separation is different to find the dates.您可以使用分隔不同的事实来查找日期。

If your dataframe is in this format:如果您的数据框采用这种格式:

 df = pd.DataFrame({'id' : [1,1,2,2,3,3], 
   "Date": ["30/8/2020","30/8/2021","30/8/2022","2019-10-24","2019-10-25","2020-10-24"] })

With either "-" or "/" to separate the data使用“-”或“/”分隔数据

you can use a function that finds this element and apply it to the date column:您可以使用查找此元素并将其应用于日期列的函数:

   def find(string):
       if string.find('/')==2:
     return True 
       else:
     return False

   df[df['date'].apply(find)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM