简体   繁体   English

使用 Python 自动生成虚拟数据

[英]Auto generate dummy data with Python

I have two text files, id and calendar.我有两个文本文件,id 和日历。 In the IDs file I have an ID at each row as below:在 IDs 文件中,我在每一行都有一个 ID,如下所示:

123
124
125
.
.
.

In the Calendar file I have the whole year of 2019 as follows:在日历文件中,我 2019 年全年如下:

1/1/2019
1/2/2019
1/3/2019
.
.
.

What I am trying to do is read an ID from the IDs text file and create the whole year's record in the following format:我想要做的是从 IDs 文本文件中读取一个 ID 并按以下格式创建全年的记录:

id, date, sms_sent, call_in_minutes, call_out_minutes, data_usage

ID should be each ID picked from the IDs file Date should be from the calendar sms_sent: a random number between 0-25 call_in_minutes: random number between 5-30 call_out_minutes: random number between 5-30 data_usage: random number between 5-50 ID 应该是从 IDs 文件中选取的每个 ID 日期应该来自日历 sms_sent:0-25 之间的随机数 call_in_minutes:5-30 之间的随机数 call_out_minutes:5-30 之间的随机数 data_usage:5-50 之间的随机数

I want to create 356 entries(corresponding to each calendar date) against the number and after creating the 365 entries for the first number move on to the second and so on and so forth.我想针对该数字创建 356 个条目(对应于每个日历日期),并在为第一个数字创建 365 个条目后移动到第二个,依此类推。

Would appreciate any help as I am just starting with python.感谢任何帮助,因为我刚开始使用 python。

update:更新:

Ok, so I have come up with this very crude code, this works with a smaller list of numbers.好的,所以我想出了这个非常粗糙的代码,它适用于较小的数字列表。 Basically I am loading the 365 calendar days in one list and the id in another.基本上我在一个列表中加载 365 个日历日,在另一个列表中加载 id。

The solution is not very elegant since 365 is not much, but the bigger list of numbers crashes the system.解决方案不是很优雅,因为 365 并不多,但是更大的数字列表会使系统崩溃。 Is there a way to make it more efficient so that the IDs are loaded one by one, processed and then the next ID is processed?有没有办法让它更高效,以便将 ID 一个一个地加载、处理,然后再处理下一个 ID?

Following is the code I have written so far, I have also written the code for generating the random data.以下是我到目前为止编写的代码,我还编写了生成随机数据的代码。


import random

lineList = list()
numbers = list()

with open('calendar.csv') as f, open('small_list.txt') as n:
  for line in f:
    lineList.append(line.rstrip('\n'))
    for number in n:
        numbers.append(number.rstrip('\n'))



print ("List comprehension:")
for x, y in [(x,y) for x in lineList for y in numbers]:
    print (x, y)




# iterate over the list
# for clientdate in lineList:
#   print(clientdate + ', ' + str(random.randint(0,25)) + ', ' + str(random.randint(10,60)) + ', ' + str(random.randint(5,25)) + ', ' +  str(random.randint(10,50)))


Thanks谢谢

Ok, so cracked it...the solution is not very elegant but it is memory efficient and gets the job done.好的,所以破解它......解决方案不是很优雅,但它是内存高效的并且可以完成工作。


import random

#Sample 3 - a more pythonic way with efficient memory usage. Proper usage of with and file iterators. 
with open("calendar.csv") as file:
    for line in file:
        line = line.strip() #preprocess line
        # print(line) #take action on line instead of storing in a list. more memory efficient at the cost of execution speed.
        with open('small_list.txt') as sm:
            for data in sm:
                data = data.strip()
                # print(line, ', ' ,data)
                print(line + ', '+ data + ', ' + str(random.randint(0,25)) + ', ' + str(random.randint(10,60)) + ', ' + str(random.randint(5,25)) + ', ' +  str(random.randint(10,50)))
                with open('output.txt', 'a+') as output:
                     output.write(line + ', '+ data + ', ' + str(random.randint(0,25)) + ', ' + str(random.randint(10,60)) + ', ' + str(random.randint(5,25)) + ', ' +  str(random.randint(10,50)) + '\n')


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM