简体   繁体   中英

Iterate through data frame to generate random number in python

Starting with this dataframe I want to generate 100 random numbers using the hmean column for loc and the hstd column for scale

I am starting with a data frame that I change to an array. I want to iterate through the entire data frame and produce the following output.

My code below will only return the answer for row zero.

     Name      amax        hmean       hstd         amin
0    Bill    22.924545   22.515861   0.375822    22.110000
1    Bob     26.118182   24.713880   0.721507    23.738400
2    Becky   23.178606   22.722464   0.454028    22.096752

This code provides one row of output, instead of three

from scipy import stats
import pandas as pd

def h2f(df, n):
    for index, row in df.iterrows(): 
        list1 = []
        nr = df.as_matrix()
        ff = stats.norm.rvs(loc=nr[index,2], scale=nr[index,3], size = n)
        list1.append(ff)
     return list1

df2 = h2f(data, 100)
pd.DataFrame(df2)

This is the output of my code

0       1          2        3         4      ...    99         100            
0   22.723833 22.208324  22.280701 22.416486     22.620035   22.55817   

This is the desired output

0         1         2            3      ...     99         100            
0   22.723833    22.208324   22.280701       22.416486  22.620035    
1   21.585776    22.190145   22.206638       21.927285  22.561882
2   22.357906    22.680952   21.4789         22.641407  22.341165           

Dedent return list1 so it is not in the for-loop. Otherwise, the function returns after only one pass through the loop.

Also move list1 = [] outside the for-loop so list1 does not get re-initialized with every pass through the loop:

import io
from scipy import stats
import pandas as pd

def h2f(df, n):
    list1 = []
    for index, row in df.iterrows(): 
        mean, std = row['hmean'], row['hstd']
        ff = stats.norm.rvs(loc=mean, scale=std, size=n)
        list1.append(ff)
    return list1

content = '''\
     Name      amax        hmean       hstd         amin
0    Bill    22.924545   22.515861   0.375822    22.110000
1    Bob     26.118182   24.713880   0.721507    23.738400
2    Becky   23.178606   22.722464   0.454028    22.096752'''

df = pd.read_table(io.BytesIO(content), sep='\s+')
df2 = pd.DataFrame(h2f(df, 100))
print(df2)

PS. It is inefficent to call nr = df.as_matrix() with each pass through the loop. Since nr never changes, at most, call it once, before entering the for-loop . Even better, just use row['hmean'] and row['hstd'] to obtain the desired numbers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM