简体   繁体   English

如何获取dataframe列的字符串中每个单词的首字母

[英]How to get the first letter of each word in a string of dataframe column

I have a dataframe column of first and last names.我有一个包含名字和姓氏的 dataframe 列。 I want to extract the initials from the names as another column in my dataframe.我想从名称中提取首字母作为 dataframe 中的另一列。 For the following dataframe:对于以下 dataframe:

   Name
0 'Brad Pitt'
1 'Bill Gates'
2 'Elon Musk'

I have came up with a solution:我想出了一个解决方案:

df['initials'] = [df['Name'][i].split()[0][0] + df['Name'][i].split()[1][0] for i in range(len(df))]

However, for a name such as 'John David Smith', this does not work, as I want to have the first letter of each word in a name.但是,对于像“John David Smith”这样的名字,这是行不通的,因为我想在名字中包含每个单词的第一个字母。 Moreover, since my dataframe is quite large, I would like to know if there is a 'vectorized' solution (without for loops).此外,由于我的 dataframe 很大,我想知道是否有“矢量化”解决方案(没有for循环)。

Thank you in advance.先感谢您。

Use list comprehension if performance is important with split and join:如果性能对split和连接很重要,请使用列表推导:

df['initials'] = [' '.join(y[0] for y in x.split()) for x in df['Name']]
print (df)
         Name initials
0   Brad Pitt      B P
1  Bill Gates      B G
2   Elon Musk      E M

Or:或者:

df['initials'] = df['Name'].apply(lambda x: ' '.join(y[0] for y in x.split()))

Solution with no for , but is is really slow:没有for解决方案,但是真的很慢:

df['initials'] = df['Name'].str.split(expand=True).apply(lambda x: x.str[0]).fillna('').agg(' '.join, axis=1).str.rstrip()

Performmance for 400k rows: 400k行的性能:

print (df)
               Name
0         Brad Pitt
1        Bill Gates
2         Elon Musk
3  John David Smith

df = pd.concat([df] * 100000, ignore_index=True)

Fastest is second and first solution, then first @mozway answer, slowiest is second @mozway solution:最快的是第二和第一个解决方案,然后是第一个@mozway 答案,最慢的是第二个@mozway 解决方案:

In [178]: %%timeit
     ...: df['initials2'] = df['Name'].apply(lambda x: ' '.join(y[0] for y in x.split()))
     ...: 
442 ms ± 3.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [177]: %%timeit
     ...: df['initials1'] = [' '.join(y[0] for y in x.split()) for x in df['Name']]
     ...: 
     ...: 
485 ms ± 7.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [180]: %%timeit
     ...: df['initials'] = df['Name'].str.replace(r'(?<=\w)\w', '', regex=True)
     ...: 
830 ms ± 8.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [179]: %%timeit 
     ...: df['initials3'] = df['Name'].str.split(expand=True).apply(lambda x: x.str[0]).fillna('').agg(' '.join, axis=1).str.rstrip()
     ...: 
18.8 s ± 772 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [181]: %%timeit
     ...: df['initials'] = (df['Name'].str.extractall(r'(?<!\w)(\w)').groupby(level=0).agg(' '.join))                 
     ...: 
     ...: 
25.3 s ± 692 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Quite easy using a short regex:使用简短的正则表达式很容易:

df['initials'] = df['Name'].str.replace(r'(?<=\w)\w+', '', regex=True)

Alternative:选择:

df['initials'] = (df['Name'].str.extractall(r'(?<!\w)(\w)')
                  .groupby(level=0).agg(' '.join)
                 )

output: output:

               Name initials
0         Brad Pitt      B P
1        Bill Gates      B G
2         Elon Musk      E M
3  John David Smith    J D S

how is works怎么样

solution 1:解决方案1:

  • find each set of letter(s) that is not the first of a word找到不是单词第一个字母的每组字母
  • delete it删除它

regex demo正则表达式演示

solution 2:解决方案2:

  • find all first letters查找所有首字母
  • join them with a space加入他们的空间

regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 2.7:如何查找字符串中每个单词的第一个字母的索引 - Python 2.7: how to find the index of the first letter of each word in a string 如何将字符串中每个单词的首字母大写? - How can I capitalize the first letter of each word in a string? 将Python中的每个单词的首字母大写 - Capitalize first letter of each word in the column Python 如何获取 dataframe 列中某个单词的第一次出现? - How to get the first occurrence of a word in a dataframe column? 如何反转字符串中的每个单词,并且python中每个单词的第一个字母大写? - How to reverse each word in string and first letter is capitalize of each word in python? 获取python中每个2个字母的单词的第一个字母 - Get the first letter of each 2-letter word in python 如何将 python 中每个单词的首字母大写? - How to capitalize the First letter of each word in python? 如何检查Python中的列表中是否存在DataFrame字符串列的第一个单词? - How to check if first word of a DataFrame string column is present in a List in Python? 如何在特定列中的每个字符串的末尾添加一个单词(熊猫数据框) - How to add a word to the end of each string in a specific column (pandas dataframe) 如何获取单词列表的每个字母 python - How to get each letter of word list python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM