简体   繁体   中英

Python & Pandas: How do I create a column of fake data containing dates in a specific format

I'm trying to create a fake column of dates within a Pandas dataframe with the following format: year.month (ex: 2022.01 for January 2022). I have ~200,000 rows in the dataframe and I would basically like to randomly assign them a date, ranging from 2010.01 to 2020.12, how can I do this using Pandas? Ideally the dtype for this new column would be a float (I am trying to recreate a training example I found and this is how it has its date formatted).

Combine pandas.date_range and numpy.random.choice :

import numpy as np

dates = (pd.date_range('2010-01', '2020-12', freq='M')
           .strftime('%Y.%m').astype(float)
        )

N = 1000
df = pd.DataFrame({'date': np.random.choice(dates, size=N)})

print(df) 

NB. Using floats is a tricky choice as you cannot control the trailing zeros. 2010-Oct could appear as 2010.1 .

Example:

        date
0    2015.03
1    2014.01
2    2014.06
3    2011.10
4    2010.11
..       ...
995  2018.07
996  2019.01
997  2015.05
998  2017.09
999  2016.03

[1000 rows x 1 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM