I need to create a new column from another one. The dataset is created by this code (I extracted only a few rows):
import pandas as pd
new_dataframe = pd.DataFrame({
"Name": ['John', 'Lukas', 'Bridget', 'Carol','Madison'],
"Notes": ["__ years old. NA", "__ years old. NA",
"__ years old. NA", "__ years old. Old account.",
"__ years old. New VIP account."],
"Status": [True, False, True, True, True]})
which generates the following
Name Notes Status
John 23 years old. NA True
Lukas 52 years old. NA False
Bridget 64 years old. NA True
Carol 31 years old. Old account True
Madison 54 years old. New VIP account. True
I need to create two new columns that contain age information in the format:
At the end I should have
Name Notes Status L_Age S_Age
John 23 years old. NA True 23 years old 23
Lukas 52 years old. NA False 52 years old 52
Bridget 64 years old. NA True 64 years old 64
Carol 31 years old. Old account True 31 years old 31
Madison 54 years old. New VIP account. True 54 years old 54
I do not know how to extract the first three words, then only the first, to create new columns. I have tried with
new_dataframe.loc[new_dataframe.Notes == '', 'L_Age'] = new_dataframe.Notes.str.split()[:3]
new_dataframe.loc[new_dataframe.Notes == '', 'S_Age'] = new_dataframe.Notes.str.split()[0]
but it is wrong ( ValueError: Must have equal len keys and value when setting with an iterable
).
Help will be appreciated.
You can use this pattern to extract the information and join:
pattern = '^(?P<L_Age>(?P<S_Age>\d+) years? old)'
new_dataframe = new_dataframe.join(new_dataframe.Notes.str.extract(pattern))
Output:
Name Notes Status L_Age S_Age
0 John 23 years old. NA True 23 years old 23
1 Lukas 52 years old. NA False 52 years old 52
2 Bridget 64 years old. NA True 64 years old 64
3 Carol 31 years old. Old account True 31 years old 31
4 Madison 54 years old. New VIP account. True 54 years old 54
IIUC:
def get_first_n_words(txt, n):
l = txt.split(' ')
assert(len(l)>=n)
return ' '.join(l[:n])
new_dataframe['L_Age'] = new_dataframe['Notes'].apply(lambda x: get_first_n_words(x, 3))
new_dataframe['S_Age'] = new_dataframe['Notes'].apply(lambda x: get_first_n_words(x, 1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.