Student - np.random.choice: How to isolate and tally hit frequency within a np.random.choice range

Question

Currently learning Python and very new to Numpy & Panda

I have pieced together a random generator with a range. It uses Numpy and I am unable to isolate each individual result to count the iterations within a range within my random's range.

Goal: Count the iterations of "Random >= 1000" and then add 1 to the appropriate cell that correlates to the tally of iterations. Example in very basic sense:

#Random generator begins... these are first four random generations
Randomiteration0 = 175994 (Random >= 1000)
Randomiteration1 = 1199 (Random >= 1000)
Randomiteration2 = 873399 (Random >= 1000)
Randomiteration3 = 322 (Random < 1000)

#used to +1 to the fourth row of column A in CSV
finalIterationTally = 4

#total times random < 1000 throughout entire session. Placed in cell B1
hits = 1
#Rinse and repeat to custom set generations quantity...

(The logic would then be to +1 to A4 in the spreadsheet. If the iteration tally would have been 7, then +1 to the A7, etc. So essentially, I am measuring the distance and frequency of that distance between each "Hit")

My current code includes a CSV export portion. I do not need to export each individual random result any longer. I only need to export the frequency of each iteration distance between each hit. This is where I am stumped.

Cheers

import pandas as pd
import numpy as np

#set random generation quantity
generations=int(input("How many generations?\n###:"))

#random range and generator
choices = range(1, 100000)
samples = np.random.choice(choices, size=generations)

#create new column in excel
my_break = 1000000
if generations > my_break:
    n_empty = my_break - generations % my_break
    samples = np.append(samples, [np.nan] * n_empty).reshape((-1, my_break)).T

#export results to CSV
(pd.DataFrame(samples)
 .to_csv('eval_test.csv', index=False, header=False))

#left uncommented if wanting to test 10 generations or so
print (samples)

Answer 1

I believe you are mixing up iterations and generations. It sounds like you want 4 iterations for N numbers of generations, but your bottom piece of code does not express the "4" anywhere. If you pull all your variables out to the top of your script it can help you organize better. Panda is great for parsing complicated csvs, but for this case you don't really need it. You probably don't even need numpy.

import numpy as np

THRESHOLD = 1000
CHOICES = 10000
ITERATIONS = 4
GENERATIONS = 100

choices = range(1, CHOICES)

output = np.zeros(ITERATIONS+1)

for _ in range(GENERATIONS):
  samples = np.random.choice(choices, size=ITERATIONS)
  count = sum([1 for x in samples if x > THRESHOLD])
  output[count] += 1

output = map(str, map(int, output.tolist()))

with open('eval_test.csv', 'w') as f:
  f.write(",".join(output)+'\n')

Student - np.random.choice: How to isolate and tally hit frequency within a np.random.choice range

Question

1 answers

solution1
0 2016-08-01 20:53:41

Student - np.random.choice: How to isolate and tally hit frequency within a np.random.choice range

Question

1 answers

solution1 0 2016-08-01 20:53:41

solution1
0 2016-08-01 20:53:41