简体   繁体   中英

Create and populate a Pandas data frame columns using two existing columns

My data frame has 4 columns and looks as below.

What I have:

ID  start_date  end_date    active
1,111   6/30/2015   8/6/1904    1 to 10
1,111   6/28/2016   3/30/1905   1 to 10
1,111   7/31/2017   6/6/1905    1 to 10
1,111   7/31/2018   6/6/1905    1 to 9
1,111   5/31/2019   12/4/1904   1 to 9
3,033   3/31/2015   5/18/1908   3 to 7
3,033   3/31/2016   11/24/1905  3 to 7
3,033   3/31/2017   1/20/1906   3 to 7
3,033   3/31/2018   1/8/1906    2 to 7
3,033   4/4/2019    2200,0  2 to 8

I want to generate 10 more columns based on the value of column "active" as below. Is there a way to populate this efficiently.

What I want to achieve

ID  start_date  end_date    active  Type 1  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9  Type 10
1,111   6/30/2015   8/6/1904    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   6/28/2016   3/30/1905   1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2017   6/6/1905    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2018   6/6/1905    1 to 9  1   1   1   1   1   1   1   1   1   
1,111   5/31/2019   12/4/1904   1 to 9  1   1   1   1   1   1   1   1   1   
3,033   3/31/2015   5/18/1908   3 to 7          1   1   1   1   1           
3,033   3/31/2016   11/24/1905  3 to 7          1   1   1   1   1           
3,033   3/31/2017   1/20/1906   3 to 7          1   1   1   1   1           
3,033   3/31/2018   1/8/1906    2 to 7      1   1   1   1   1   1           
3,033   4/4/2019    2200,0  2 to 8      1   1   1   1   1   1   1       

Use custom function with np.arange :

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type '))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10     1.0     1.0     1.0     1.0   
1  1,111  6/28/2016   3/30/1905  1 to 10     1.0     1.0     1.0     1.0   
2  1,111  7/31/2017    6/6/1905  1 to 10     1.0     1.0     1.0     1.0   
3  1,111  7/31/2018    6/6/1905   1 to 9     1.0     1.0     1.0     1.0   
4  1,111  5/31/2019   12/4/1904   1 to 9     1.0     1.0     1.0     1.0   
5  3,033  3/31/2015   5/18/1908   3 to 7     NaN     NaN     1.0     1.0   
6  3,033  3/31/2016  11/24/1905   3 to 7     NaN     NaN     1.0     1.0   
7  3,033  3/31/2017   1/20/1906   3 to 7     NaN     NaN     1.0     1.0   
8  3,033  3/31/2018    1/8/1906   2 to 7     NaN     1.0     1.0     1.0   
9  3,033   4/4/2019      2200,0   2 to 8     NaN     1.0     1.0     1.0   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0     1.0     1.0     1.0     1.0     1.0      1.0  
1     1.0     1.0     1.0     1.0     1.0      1.0  
2     1.0     1.0     1.0     1.0     1.0      1.0  
3     1.0     1.0     1.0     1.0     1.0      NaN  
4     1.0     1.0     1.0     1.0     1.0      NaN  
5     1.0     1.0     1.0     NaN     NaN      NaN  
6     1.0     1.0     1.0     NaN     NaN      NaN  
7     1.0     1.0     1.0     NaN     NaN      NaN  
8     1.0     1.0     1.0     NaN     NaN      NaN  
9     1.0     1.0     1.0     1.0     NaN      NaN   

Similar:

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type ').fillna(0).astype(int))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  

Another non loop solution - idea is remove duplicates, create new rows with get_dummies , reindex for add missing columns and last add 1 by multiple cumsum ed values:

df1 = (df.set_index('active', drop=False)
        .pop('active')
        .drop_duplicates()
        .str.get_dummies(' to '))

df1.columns = df1.columns.astype(int)
df1 = df1.reindex(columns=np.arange(df1.columns.min(),df1.columns.max() + 1), fill_value=0)
df1 = (df1.cumsum(axis=1) * df1.iloc[:, ::-1].cumsum(axis=1)).clip_upper(1)
print (df1)
         1   2   3   4   5   6   7   8   9   10
active                                         
1 to 10   1   1   1   1   1   1   1   1   1   1
1 to 9    1   1   1   1   1   1   1   1   1   0
3 to 7    0   0   1   1   1   1   1   0   0   0
2 to 7    0   1   1   1   1   1   1   0   0   0
2 to 8    0   1   1   1   1   1   1   1   0   0

df = df.join(df1.add_prefix('Type '), on='active')
print (df)

      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  
def f(s):
  a, b = map(int, s.split('to'))
  return '|'.join(map(str, range(a, b + 1)))

df.drop('active', 1).join(df.active.apply(f).str.get_dummies().add_prefix('Type '))

      ID start_date    end_date  Type 1  Type 10  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9
0  1,111  6/30/2015    8/6/1904       1        1       1       1       1       1       1       1       1       1
1  1,111  6/28/2016   3/30/1905       1        1       1       1       1       1       1       1       1       1
2  1,111  7/31/2017    6/6/1905       1        1       1       1       1       1       1       1       1       1
3  1,111  7/31/2018    6/6/1905       1        0       1       1       1       1       1       1       1       1
4  1,111  5/31/2019   12/4/1904       1        0       1       1       1       1       1       1       1       1
5  3,033  3/31/2015   5/18/1908       0        0       0       1       1       1       1       1       0       0
6  3,033  3/31/2016  11/24/1905       0        0       0       1       1       1       1       1       0       0
7  3,033  3/31/2017   1/20/1906       0        0       0       1       1       1       1       1       0       0
8  3,033  3/31/2018    1/8/1906       0        0       1       1       1       1       1       1       0       0
9  3,033   4/4/2019      2200,0       0        0       1       1       1       1       1       1       1       0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM