简体   繁体   中英

My loop does not append the results of print() to the dataframe I created

so I wanted to create a loop in order to generate 1000 results of a t-test from random sampling from two different populations. My loop does basically what it is required, the only issue is that I would like to append the result of the print, to a dataframe.

results = pd.DataFrame({'Effect Size':[], 'p-value':[]})

for i in range(1000):
    sample1 = np.random.normal(0,1,1000)
    sample2 = np.random.normal(.05,1,1000)
    effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
    results = pd.DataFrame(print(effect_size,pvalue))
    
results.head()

The output I get however is this one:

-1.6143890836641985 0.10660095803269495
-2.0260421693695845 0.0428931041087038
-2.7052945035320413 0.006882349977869199
-0.650014611610562 0.5157575104187226
0.35589181647004076 0.721959156357101
-1.8580323211600547 0.0633114210246122
-2.1346234965598185 0.03291315538511747
-1.5619392256304192 0.11846067349115201
-1.4286159705357937 0.15327094637955832
-2.5338588520198324 0.011357254651096133
-1.125224663298795 0.2606289939128222
-1.8130036805024503 0.06998125666628215
-0.0350581349501468 0.9720368863172242
-0.14942653694599559 0.881232154213759
-1.3726021387765257 0.17003011697766837
-0.391077951258786 0.6957813156125576
-1.8118048538852072 0.07016643231973188

_

My desired output is to attach those 2 values in 2 separate columns on the dataframe I created above. Any solutions?

Collect your results first then create the dataframe:

import pandas as pd
import numpy as np
import scipy.stats as stats

results = []
for i in range(1000):
    sample1 = np.random.normal(0,1,1000)
    sample2 = np.random.normal(.05,1,1000)
    effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
    results.append([effect_size, pvalue])

df = pd.DataFrame(results, columns=['Effect Size', 'p-value'])

Output:

>>> df
     Effect Size   p-value
0      -1.490185  0.136333
1      -1.541894  0.123258
2      -1.761850  0.078248
3      -1.423281  0.154811
4      -1.399392  0.161851
..           ...       ...
995    -2.137380  0.032688
996    -0.510703  0.609615
997     0.260885  0.794208
998    -3.361631  0.000789
999    -1.648494  0.099409

[1000 rows x 2 columns]

Update You can avoid the loop:

# I used only 10 iterations here for better understanding
sample1 = np.random.normal(0, 1, (10, 1000))
sample2 = np.random.normal(.05, 1, (10, 1000))
effect_size, pvalue = stats.ttest_ind(a=sample11, b=sample21, axis=1, equal_var=True)
df = pd.DataFrame({'Effect Size': effect_size, 'p-value': pvalue})
print(df)

# Output
   Effect Size   p-value
0    -1.154039  0.248622
1    -0.590073  0.555208
2    -0.722039  0.470355
3    -1.088286  0.276600
4    -1.337602  0.181178
5    -0.756837  0.449237
6    -1.875409  0.060882
7    -1.532000  0.125681
8    -1.032455  0.301984
9    -2.358115  0.018464

this should work, using loc function and getting rid of print

results = pd.DataFrame({'Effect Size':[], 'p-value':[]})
for i in range(1000):
    sample1 = np.random.normal(0,1,1000)
    sample2 = np.random.normal(.05,1,1000)
    effect_size, pvalue = stats.ttest_ind(a=sample1, b=sample2, equal_var=True)
    results.loc[i,:] = [effect_size,pvalue]
    
  1. Python print returns nothing, so putting it inside a DataFrame object doesnt add the values to that dataframe.

  2. You construct a new dataframe on each iteration of the loop, so you never add new items to it.

  3. To add new items to a dataframe use: results.append(pd.Dataframe({'col1': effect_size, 'col2': p_value})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM