简体   繁体   中英

Use string formatting in pandas DataFrame columns

I have following data frame (both columns str type):

+------+-----------------+
| year | indicator_short |
+------+-----------------+
| 2020 | ind_1           |
| 2019 | ind_2           |
| 2019 | ind_3           |
| N/A  | ind_4           |
+------+-----------------+

I would like to add new column which will contain concatenation of two existing columns, but I would like them to be formatted like:

+------+-----------------+--------------------+
| year | indicator_short |   indicator_full   |
+------+-----------------+--------------------+
| 2020 | ind_1           | Indicator_1 (2020) |
| 2019 | ind_2           | Indicator_2 (2019) |
| 2019 | ind_3           | Indicator_3 (2019) |
| N/A  | ind_4           | Indicator_4 (N/A)  |
+------+-----------------+--------------------+

One thing is coming to my mind is use formatting, something like':

df['indicator_full'][df['indicator_short']=='ind_1'] = 'Indicator_1 ({})'.format(df['year'])

but it gives wrong result.

I'd go with string concatenation and formatting the string column as:

years = '('+df['year'].astype(str).str.replace(r'.0$','')+')' 
# years =  '('+df['year']+')' if the year col is a string
df['indicator_full   '] = ('Indicator_'+df.indicator_short.str.rsplit('_').str[-1]) \
                                          .str.cat(years, sep=' ')

print(df)
     year indicator_short   indicator_full   
0  2020.0           ind_1  Indicator_1 (2020)
1  2019.0           ind_2  Indicator_2 (2019)
2  2019.0           ind_3  Indicator_3 (2019)
3     NaN           ind_4   Indicator_4 (nan)

Use Series.str.extract for get integers from indicator_short , get integers from floats in year column and last join together:

i = df['indicator_short'].str.extract('(\d+)', expand=False)
y = df['year'].astype('Int64').astype(str).replace('<NA>','N/A')

df['indicator_full'] = 'Indicator_' + i + ' (' + y + ')'
print (df)
0  2020.0           ind_1  Indicator_1 (2020)
1  2019.0           ind_2  Indicator_2 (2019)
2  2019.0           ind_3  Indicator_3 (2019)
3     NaN           ind_4   Indicator_4 (N/A)

Use .str.cat() to concat the two columns after replacing ind with Indicator using .str.replace.

df['indicator_full']=(df.indicator_short.str.replace('ind','Indicator')).str.cat("("+df['year']+ ")", sep=(" ") )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM