I have a pandas dataframe, and I would like to create a new column with a substring of a string contained in a column.
For eg. "race" column contains the word "2016_Lap_JAPANESE_Third_Times.csv", i would like to extract the word 'japanese'.
An approach i am taking now is to compare if the word is in a list, if yes, inpute that value to the new column.
race_names = ['japanese'] -> i have along list of elements in this listand and multiple names in "race" column.
for i,row in df_fp2.iterrows():
for name in race_names:
if name in df_fp2.loc[i,'race']:
df_fp2.loc[i,'name'] = str(name) + " Grand Prix"
Df converted to dictionary.
{'driverRef': {151: 'button',
152: 'button',
153: 'button',
154: 'button',
155: 'button'},
'driver_no': {151: 22, 152: 22, 153: 22, 154: 22, 155: 22},
'milliseconds': {151: 1339994.0,
152: 692245.0,
153: 96286.0,
154: 94547.999999999985,
155: 114725.0},
'name': {151: 'J.BUTTON',
152: 'J.BUTTON',
153: 'J.BUTTON',
154: 'J.BUTTON',
155: 'J.BUTTON'},
'race': {151: '2016_Lap_JAPANESE_Third_Times.csv',
152: '2016_Lap_JAPANESE_Third_Times.csv',
153: '2016_Lap_JAPANESE_Third_Times.csv',
154: '2016_Lap_JAPANESE_Third_Times.csv',
155: '2016_Lap_JAPANESE_Third_Times.csv'},
'time': {151: 1339.9939999999999,
152: 692.245,
153: 96.286000000000001,
154: 94.547999999999988,
155: 114.72499999999999}}
This is an array of unique elements in "race" column of df, as the arrangement of words are different, i cannot simply strip the words in front and behind each country name.
array(['2016_Lap_ABU_Third_Times.csv', '2016_Lap_BRASIL_Third_Times.csv',
'2016_Lap_CHINESE_Third_Times.csv',
'2016_Lap_JAPANESE_Third_Times.csv',
'2016_Lap_MAGYAR_Third_Times.csv',
'2016_Lap_SINGAPORE_Third_Times.csv', '2016_Lap_Third_Times.csv',
'2016_Lap_UNITED_Third_Times.csv',
'AUSTRALIAN_2016_Lap_Third_Times.csv',
'BAHRAIN_2016_Lap_Third_Times.csv',
'BELGIAN_2016_Lap_Third_Times.csv',
'CANADA_2016_Lap_Third_Times.csv',
'ESPANA_2016_Lap_Third_Times.csv',
'EUROPE_2016_Lap_Third_Times.csv',
'MALAYSIA_2016_Lap_Third_Times.csv',
'Mexico_2016_Lap_Third_Times.csv',
'RUSSIAN_2016_Lap_Third_Times.csv'], dtype=object)
If in race_names
are all possible extracted words use str.extract
:
import re
race_names = ['japanese']
pat = '|'.join(r"{}".format(x) for x in race_names)
df['name'] = df['race'].str.extract('('+ pat + ')', expand=False, flags=re.I) + " Grand Prix"
print (df)
driverRef driver_no milliseconds name \
151 button 22 1339994.0 JAPANESE Grand Prix
152 button 22 692245.0 JAPANESE Grand Prix
153 button 22 96286.0 JAPANESE Grand Prix
154 button 22 94548.0 JAPANESE Grand Prix
155 button 22 114725.0 JAPANESE Grand Prix
race time
151 2016_Lap_JAPANESE_Third_Times.csv 1339.994
152 2016_Lap_JAPANESE_Third_Times.csv 692.245
153 2016_Lap_JAPANESE_Third_Times.csv 96.286
154 2016_Lap_JAPANESE_Third_Times.csv 94.548
155 2016_Lap_JAPANESE_Third_Times.csv 114.725
Maybe is possible also use replace
and str.strip
:
df = pd.DataFrame({'race':['2016_Lap_ABU_Third_Times.csv', '2016_Lap_BRASIL_Third_Times.csv',
'2016_Lap_CHINESE_Third_Times.csv',
'2016_Lap_JAPANESE_Third_Times.csv',
'2016_Lap_MAGYAR_Third_Times.csv',
'2016_Lap_SINGAPORE_Third_Times.csv', '2016_Lap_Third_Times.csv',
'2016_Lap_UNITED_Third_Times.csv',
'AUSTRALIAN_2016_Lap_Third_Times.csv',
'BAHRAIN_2016_Lap_Third_Times.csv',
'BELGIAN_2016_Lap_Third_Times.csv',
'CANADA_2016_Lap_Third_Times.csv',
'ESPANA_2016_Lap_Third_Times.csv',
'EUROPE_2016_Lap_Third_Times.csv',
'MALAYSIA_2016_Lap_Third_Times.csv',
'Mexico_2016_Lap_Third_Times.csv',
'RUSSIAN_2016_Lap_Third_Times.csv']})
df['name'] = (df['race'].replace(['_Third_Times.csv','Lap', '\d+'], '', regex=True)
.str.strip('_'))
print (df)
race name
0 2016_Lap_ABU_Third_Times.csv ABU
1 2016_Lap_BRASIL_Third_Times.csv BRASIL
2 2016_Lap_CHINESE_Third_Times.csv CHINESE
3 2016_Lap_JAPANESE_Third_Times.csv JAPANESE
4 2016_Lap_MAGYAR_Third_Times.csv MAGYAR
5 2016_Lap_SINGAPORE_Third_Times.csv SINGAPORE
6 2016_Lap_Third_Times.csv
7 2016_Lap_UNITED_Third_Times.csv UNITED
8 AUSTRALIAN_2016_Lap_Third_Times.csv AUSTRALIAN
9 BAHRAIN_2016_Lap_Third_Times.csv BAHRAIN
10 BELGIAN_2016_Lap_Third_Times.csv BELGIAN
11 CANADA_2016_Lap_Third_Times.csv CANADA
12 ESPANA_2016_Lap_Third_Times.csv ESPANA
13 EUROPE_2016_Lap_Third_Times.csv EUROPE
14 MALAYSIA_2016_Lap_Third_Times.csv MALAYSIA
15 Mexico_2016_Lap_Third_Times.csv Mexico
16 RUSSIAN_2016_Lap_Third_Times.csv RUSSIAN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.