简体   繁体   中英

get first letter of a String (sentence) given in a column and create a new column in python

I have a column in pandas dataframe which has items like following,

SubBrand
Sam William Mathew
Jonty Rodes
Chris Gayle

I want to create a new column (SubBrand_new) such as

  SubBrand_new
0 SWM
1 JR
2 CG

I am using this piece of code,

df1["SubBrand_new"] = "".join([x[0] for x in (df1["SubBrand"].str.split())])

but not able to get what I am looking for. Can anybody help?

We can do split with expand and sum ie

df['SubBrand'].str.split(expand=True).apply(lambda x : x.str[0]).fillna('').sum(1)

0    SWM
1     JR
2     CG
dtype: object

You want to apply a function to every line and return a new column with its result. This kind of operation can be applied with the .apply() method, a simple = attribution will not do the trick. A solution in the spirit of your code would be:

df = pd.DataFrame({'Name': ['Marcus Livius Drussus',
                            'Lucius Cornelius Sulla',
                            'Gaius Julius Caesar']})
df['Abrev'] = df.Name.apply(lambda x: "".join([y[0] for y in (x.split())]))

Which yields

df
    Name                    Abrev
0   Marcus Levius Drussus   MLD
1   Lucius Cornelius Sulla  LCS
2   Gaius Julius Caesar     GJC

EDIT:

I compared it to the other solution, thinking that the apply() method with join() would be pretty slow. I was surprised to find that it is in fact faster. Setting:

N = 3000000
bank = pd.util.testing.rands_array(3,N)
vec = [bank[3*i] + ' ' + bank[3*i+1] + ' ' + bank[3*i+2]  for i in range(N/3)]
df = pd.DataFrame({'Name': vec})

I find:

df.Name.apply(lambda x: "".join([y[0] for y in (x.split())]))
executed in 581ms

df.Name.str.split(expand=True).apply(lambda x : x.str[0]).fillna('').sum(1)
executed in 2.81s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM