简体   繁体   中英

Pandas - populate one dataframe column based on another

Based on the Month2 column I need to determine values for Forecast_for and Forecast_in columns. I wrote the following code to check positions of the words forecast and in ; and then use str function to extract the relevant values to solve this. But I only get NaN values. Can someone please help? Please let me know if there is a better method. The aim is to ultimately convert the Forecast_for/Forecast_in columns into numeric year and month eg December 2018/19 would eventually become Forecast_for_Year = 2018 and Forecast_for_Month = 12 .

Thanks in advance!

data = {'Month2': ['December 2018/19 forecast in November 2018/19', 'January 2018/19 forecast in November 2018/19', 'March 2018/19 forecast in November 2018/19', 'June 2019/20 forecast in May 2019/20'],
        'len_month2':['','','',''] ,
        'pos_forecast': ['','','',''],
        'pos_in': ['','','',''],
       'Forecast_for': ['','','',''],
       'Forecast_in': ['','','',''],
       'Forecast_for_Year': ['','','',''],
       'Forecast_for_Month': ['','','',''],       
       'Forecast_in_Year': ['','','',''],
       'Forecast_in_Month': ['','','','']}
df = pd.DataFrame(data, columns = ['Month2', 'len_month2', 'pos_forecast', 'pos_in', 'Forecast_for', 'Forecast_in', 
                                   'Forecast_for_Year', 'Forecast_for_Month', 'Forecast_in_Year', 'Forecast_in_Month'])

#Calculate Forecast_for
df['pos_forecast'] = df['Month2'].str.find('forecast')
df['Forecast_for'] = df['Month2'].str[:df['pos_forecast']]

#Calculate Forecast_in
df['pos_in'] = df['Month2'].str.find('in')
df['len_month2'] = df['Month2'].str.len()
df['Forecast_in'] = df['Month2'].str[(df['len_month2'] - df['pos_in']):]
df

You can use the following to extract Forecast_for and Forecast_in

df['Forecast_for'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+)')
df['Forecast_in'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+$)')

Update

df.Forecast_in_Year = pd.to_datetime(df.Forecast_in).dt.year
df.Forecast_in_Month = pd.to_datetime(df.Forecast_in).dt.month
df.Forecast_for_Year = pd.to_datetime(df.Forecast_for).dt.year
df.Forecast_for_Month = pd.to_datetime(df.Forecast_for).dt.month

Output

    Month2  len_month2  pos_forecast    pos_in  Forecast_for    Forecast_in Forecast_for_Year   Forecast_for_Month  Forecast_in_Year    Forecast_in_Month
0   December 2018/19 forecast in November 2018/19   45  17  26  December 2018/19    November 2018/19    2018    12  2018    11
1   January 2018/19 forecast in November 2018/19    44  16  25  January 2018/19 November 2018/19    2018    1   2018    11
2   March 2018/19 forecast in November 2018/19  42  14  23  March 2018/19   November 2018/19    2018    3   2018    11
3   June 2019/20 forecast in May 2019/20    36  13  22  June 2019/20    May 2019/20 2019    6   2019    5

You can also try splitting the string.

    df['Forecast_for_Month'] = df['Month2'].str.split().str[0].
    df['Forecast_in_Month'] = df['Month2'].str.split().str[4]
    df['Forecast_for_Year'] = df['Month2'].str.split().str[1]
    df['Forecast_in_Year'] = df['Month2'].str.split().str[5]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM