I have a column in pandas dataframe which has items like following,
SubBrand
Sam William Mathew
Jonty Rodes
Chris Gayle
I want to create a new column (SubBrand_new) such as
SubBrand_new
0 SWM
1 JR
2 CG
I am using this piece of code,
df1["SubBrand_new"] = "".join([x[0] for x in (df1["SubBrand"].str.split())])
but not able to get what I am looking for. Can anybody help?
We can do split with expand and sum ie
df['SubBrand'].str.split(expand=True).apply(lambda x : x.str[0]).fillna('').sum(1)
0 SWM
1 JR
2 CG
dtype: object
You want to apply a function to every line and return a new column with its result. This kind of operation can be applied with the .apply()
method, a simple =
attribution will not do the trick. A solution in the spirit of your code would be:
df = pd.DataFrame({'Name': ['Marcus Livius Drussus',
'Lucius Cornelius Sulla',
'Gaius Julius Caesar']})
df['Abrev'] = df.Name.apply(lambda x: "".join([y[0] for y in (x.split())]))
Which yields
df
Name Abrev
0 Marcus Levius Drussus MLD
1 Lucius Cornelius Sulla LCS
2 Gaius Julius Caesar GJC
EDIT:
I compared it to the other solution, thinking that the apply()
method with join()
would be pretty slow. I was surprised to find that it is in fact faster. Setting:
N = 3000000
bank = pd.util.testing.rands_array(3,N)
vec = [bank[3*i] + ' ' + bank[3*i+1] + ' ' + bank[3*i+2] for i in range(N/3)]
df = pd.DataFrame({'Name': vec})
I find:
df.Name.apply(lambda x: "".join([y[0] for y in (x.split())]))
executed in 581ms
df.Name.str.split(expand=True).apply(lambda x : x.str[0]).fillna('').sum(1)
executed in 2.81s
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.