简体   繁体   中英

Generating n amount new rows on a pandas dataframe based off values given in other columns

So, I have the following sample dataframe (included only one row for clarity/simplicity):

df = pd.DataFrame({'base_number': [2],
                   'std_dev': [1]})
df['amount_needed'] = 5
df['upper_bound'] = df['base_number'] + df['std_dev']
df['lower_bound'] = df['base_number'] - df['std_dev']

For each given rows, I would like to generate the amount of rows such that the total amount per row is the number given by df['amount_needed'] (so 5, in this example). I would like those 5 new rows to be spread across a spectrum given by df['upper_bound'] and df['lower_bound'] . So for the example above, I would like the following result as an output:

df_new = pd.DataFrame({'base_number': [1, 1.5, 2, 2.5, 3]})

Of course, this process will be done for all rows in a much larger dataframe, with many other columns which aren't relevant to this particular issue, which is why I'm trying to find a way to automate this process.

One row of df will create one series (or one data frame). Here's one way to iterate over df and create the series with the values you specified:

for row in df.itertuples():
    arr = np.linspace(row.lower_bound, 
                      row.upper_bound, 
                      row.amount_needed)
    s = pd.Series(arr).rename('base_number')
    
print(s)

0    1.0
1    1.5
2    2.0
3    2.5
4    3.0
Name: base_number, dtype: float64

Ended up using jsmart's contribution and working on it to generate a new dataframe, conserving original id's in order to merge the other columns from the old one onto this new one according to id as needed (whole process shown below):

amount_needed = 5
df = pd.DataFrame({'base_number': [2, 4, 8, 0],

                   'std_dev': [1, 2, 3, 0]})
df['amount_needed'] = amount_needed
df['upper_bound'] = df['base_number'] + df['std_dev']
df['lower_bound'] = df['base_number'] - df['std_dev']

s1 = pd.Series([],dtype = int)
for row in df.itertuples():
    arr = np.linspace(row.lower_bound, 
                      row.upper_bound, 
                      row.amount_needed)
    s = pd.Series(arr).rename('base_number')
    s1 = pd.concat([s1, s])
  
df_new = pd.DataFrame({'base_number': s1})

ids_og = list(range(1, len(df) + 1))
ids_og = [ids_og] * amount_needed
ids_og = sorted(list(itertools.chain.from_iterable(ids_og)))

df_new['id'] = ids_og

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM