简体   繁体   English

学生-np.random.choice:如何隔离和计算np.random.choice范围内的频率

[英]Student - np.random.choice: How to isolate and tally hit frequency within a np.random.choice range

Currently learning Python and very new to Numpy & Panda 目前正在学习Python,是Numpy&Panda的新手

I have pieced together a random generator with a range. 我拼凑了一个带范围的随机生成器。 It uses Numpy and I am unable to isolate each individual result to count the iterations within a range within my random's range. 它使用Numpy,我无法隔离每个单独的结果来计算我的随机范围内某个范围内的迭代。

Goal: Count the iterations of "Random >= 1000" and then add 1 to the appropriate cell that correlates to the tally of iterations. 目标:计算“ Random> = 1000”的迭代次数,然后将1加到与迭代次数相关的适当单元格中。 Example in very basic sense: 基本意义上的示例:

#Random generator begins... these are first four random generations
Randomiteration0 = 175994 (Random >= 1000)
Randomiteration1 = 1199 (Random >= 1000)
Randomiteration2 = 873399 (Random >= 1000)
Randomiteration3 = 322 (Random < 1000)

#used to +1 to the fourth row of column A in CSV
finalIterationTally = 4

#total times random < 1000 throughout entire session. Placed in cell B1
hits = 1
#Rinse and repeat to custom set generations quantity...

(The logic would then be to +1 to A4 in the spreadsheet. If the iteration tally would have been 7, then +1 to the A7, etc. So essentially, I am measuring the distance and frequency of that distance between each "Hit") (逻辑将是在电子表格中+1到A4。如果迭代计数是7,则+1到A7,依此类推。因此,基本上,我正在测量每个“命中率”之间的距离和频率”)

My current code includes a CSV export portion. 我当前的代码包括CSV导出部分。 I do not need to export each individual random result any longer. 我不再需要导出每个随机结果。 I only need to export the frequency of each iteration distance between each hit. 我只需要导出每次命中之间每次迭代距离的频率。 This is where I am stumped. 这就是我感到难过的地方。

Cheers 干杯

import pandas as pd
import numpy as np

#set random generation quantity
generations=int(input("How many generations?\n###:"))

#random range and generator
choices = range(1, 100000)
samples = np.random.choice(choices, size=generations)

#create new column in excel
my_break = 1000000
if generations > my_break:
    n_empty = my_break - generations % my_break
    samples = np.append(samples, [np.nan] * n_empty).reshape((-1, my_break)).T

#export results to CSV
(pd.DataFrame(samples)
 .to_csv('eval_test.csv', index=False, header=False))

#left uncommented if wanting to test 10 generations or so
print (samples)

I believe you are mixing up iterations and generations. 我相信您正在混合迭代和世代。 It sounds like you want 4 iterations for N numbers of generations, but your bottom piece of code does not express the "4" anywhere. 听起来您想要N个世代数进行4次迭代,但是您的底层代码在任何地方都不表示“ 4”。 If you pull all your variables out to the top of your script it can help you organize better. 如果将所有变量都拉到脚本顶部,则可以帮助您更好地组织。 Panda is great for parsing complicated csvs, but for this case you don't really need it. Panda非常适合解析复杂的csv,但是在这种情况下,您实际上并不需要它。 You probably don't even need numpy. 您甚至可能不需要numpy。

import numpy as np

THRESHOLD = 1000
CHOICES = 10000
ITERATIONS = 4
GENERATIONS = 100

choices = range(1, CHOICES)

output = np.zeros(ITERATIONS+1)

for _ in range(GENERATIONS):
  samples = np.random.choice(choices, size=ITERATIONS)
  count = sum([1 for x in samples if x > THRESHOLD])
  output[count] += 1

output = map(str, map(int, output.tolist()))

with open('eval_test.csv', 'w') as f:
  f.write(",".join(output)+'\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM