简体   繁体   English

熊猫如何创建随机虚拟数据

[英]Pandas how to create random dummy data

I often find myself in a situation, where I want to test some function on a sample dataframe. 我经常发现自己想在样本数据帧上测试某些功能。

Its super easy to create a random dataframe with numbers, like this: 创建带有数字的随机数据帧非常容易,如下所示:

pd.DataFrame(np.random.randn(5, 3), columns=list('ABC')) or pd.DataFrame(np.random.randint(2,10,(5,3)), columns=list('ABC')) if you want some more control over the values in your dummy data. pd.DataFrame(np.random.randn(5, 3), columns=list('ABC'))pd.DataFrame(np.random.randint(2,10,(5,3)), columns=list('ABC'))如果您希望对虚拟数据中的值进行更多控制,请使用pd.DataFrame(np.random.randint(2,10,(5,3)), columns=list('ABC'))

I am wondering if there is a more general library out there, that helps you to create dummy data of various types (eg datetime, categorial, ...)? 我想知道是否有一个更通用的库,它可以帮助您创建各种类型的伪数据(例如,日期时间,类别等)?

looketh and you shall find 看起来,你会发现

I changed it ever so slightly to get rid of the numpy warning: 我做了些微的更改,以消除numpy警告:

import pandas as pd
import numpy as np
import datetime

dft = pd.DataFrame({
    'A' : ['spam', 'eggs', 'spam', 'eggs'] * 6,
    'B' : ['alpha', 'beta', 'gamma'] * 8,
    'C' : [np.random.choice(pd.date_range(datetime.datetime(2013,1,1),datetime.datetime(2013,1,3))) for i in range(24)],
    'D' : np.random.randn(24),
    'E' : np.random.randint(2,10,24),
    'F' : [np.random.choice(['rand_1', 'rand_2', 'rand_4', 'rand_6']) for i in range(24)],
})

dft

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM