so I wanted to create a loop in order to generate 1000 results of a t-test from random sampling from two different populations. My loop does basically what it is required, the only issue is that I would like to append the result of the print, to a dataframe.
results = pd.DataFrame({'Effect Size':[], 'p-value':[]})
for i in range(1000):
sample1 = np.random.normal(0,1,1000)
sample2 = np.random.normal(.05,1,1000)
effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
results = pd.DataFrame(print(effect_size,pvalue))
results.head()
The output I get however is this one:
-1.6143890836641985 0.10660095803269495
-2.0260421693695845 0.0428931041087038
-2.7052945035320413 0.006882349977869199
-0.650014611610562 0.5157575104187226
0.35589181647004076 0.721959156357101
-1.8580323211600547 0.0633114210246122
-2.1346234965598185 0.03291315538511747
-1.5619392256304192 0.11846067349115201
-1.4286159705357937 0.15327094637955832
-2.5338588520198324 0.011357254651096133
-1.125224663298795 0.2606289939128222
-1.8130036805024503 0.06998125666628215
-0.0350581349501468 0.9720368863172242
-0.14942653694599559 0.881232154213759
-1.3726021387765257 0.17003011697766837
-0.391077951258786 0.6957813156125576
-1.8118048538852072 0.07016643231973188
_
My desired output is to attach those 2 values in 2 separate columns on the dataframe I created above. Any solutions?
Collect your results first then create the dataframe:
import pandas as pd
import numpy as np
import scipy.stats as stats
results = []
for i in range(1000):
sample1 = np.random.normal(0,1,1000)
sample2 = np.random.normal(.05,1,1000)
effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
results.append([effect_size, pvalue])
df = pd.DataFrame(results, columns=['Effect Size', 'p-value'])
Output:
>>> df
Effect Size p-value
0 -1.490185 0.136333
1 -1.541894 0.123258
2 -1.761850 0.078248
3 -1.423281 0.154811
4 -1.399392 0.161851
.. ... ...
995 -2.137380 0.032688
996 -0.510703 0.609615
997 0.260885 0.794208
998 -3.361631 0.000789
999 -1.648494 0.099409
[1000 rows x 2 columns]
Update You can avoid the loop:
# I used only 10 iterations here for better understanding
sample1 = np.random.normal(0, 1, (10, 1000))
sample2 = np.random.normal(.05, 1, (10, 1000))
effect_size, pvalue = stats.ttest_ind(a=sample11, b=sample21, axis=1, equal_var=True)
df = pd.DataFrame({'Effect Size': effect_size, 'p-value': pvalue})
print(df)
# Output
Effect Size p-value
0 -1.154039 0.248622
1 -0.590073 0.555208
2 -0.722039 0.470355
3 -1.088286 0.276600
4 -1.337602 0.181178
5 -0.756837 0.449237
6 -1.875409 0.060882
7 -1.532000 0.125681
8 -1.032455 0.301984
9 -2.358115 0.018464
this should work, using loc
function and getting rid of print
results = pd.DataFrame({'Effect Size':[], 'p-value':[]})
for i in range(1000):
sample1 = np.random.normal(0,1,1000)
sample2 = np.random.normal(.05,1,1000)
effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
results.loc[i,:] = [effect_size,pvalue]
Python print returns nothing, so putting it inside a DataFrame object doesnt add the values to that dataframe.
You construct a new dataframe on each iteration of the loop, so you never add new items to it.
To add new items to a dataframe use: results.append(pd.Dataframe({'col1': effect_size, 'col2': p_value})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.