Pythonic方式过滤列然后创建新列

Question

I have a .xlsx file that I am opening with this code: 我有一个.xlsx文件，我打开这个代码：

import pandas as pd

df = pd.read_excel(open('file.xlsx','rb'))
df['Description'].head

and I have the following result, which looks pretty good. 我有以下结果，看起来很不错。

ID     | Description
:----- | :-----------------------------
0      | Some Description with no hash
1      | Text with #one hash
2      | Text with #two #hashes

Now I want to create a new column, keeping only words started with #, like this one: 现在我想创建一个新列，只保留以＃开头的单词，如下所示：

ID     | Description                      |  Only_Hash
:----- | :-----------------------------   |  :-----------------
0      | Some Description with no hash    |   Nan
1      | Text with #one hash              |   #one
2      | Text with #two #hashes           |   #two #hashes

I was able to count/separate lines with #: 我能用＃计算/分隔线：

descriptionWithHash = df['Description'].str.contains('#').sum()

but now I want to create the column like I described above. 但现在我想像上面描述的那样创建列。 What is the easiest way to do that? 最简单的方法是什么？

Regards! 问候！

PS: it is supposed to show a table format in the question but I can't figure out why it is showing wrong! PS：它应该在问题中显示表格格式，但我无法弄清楚它为什么显示错误！

Answer 1

You can use str.findall with str.join : 您可以将str.findall与str.join str.findall使用：

df['new'] =  df['Description'].str.findall('(\#\w+)').str.join(' ')
print(df)
   ID                    Description           new
0   0  Some Description with no hash              
1   1            Text with #one hash          #one
2   2         Text with #two #hashes  #two #hashes

And for NaNs: 对于NaNs：

df['new'] = df['Description'].str.findall('(\#\w+)').str.join(' ').replace('',np.nan)
print(df)
   ID                    Description           new
0   0  Some Description with no hash           NaN
1   1            Text with #one hash          #one
2   2         Text with #two #hashes  #two #hashes

Answer 2

In [126]: df.join(df.Description
     ...:           .str.extractall(r'(\#\w+)')
     ...:           .unstack(-1)
     ...:           .T.apply(lambda x: x.str.cat(sep=' ')).T
     ...:           .to_frame(name='Hash'))
Out[126]:
   ID                    Description          Hash
0   0  Some Description with no hash           NaN
1   1            Text with #one hash          #one
2   2         Text with #two #hashes  #two #hashes

Pythonic方式过滤列然后创建新列

问题描述

2 个解决方案

解决方案1
5 已采纳 2017-07-31 11:19:38

解决方案2
5 2017-07-31 11:20:55

Pythonic方式过滤列然后创建新列

问题描述

2 个解决方案

解决方案1 5 已采纳 2017-07-31 11:19:38

解决方案2 5 2017-07-31 11:20:55

解决方案1
5 已采纳 2017-07-31 11:19:38

解决方案2
5 2017-07-31 11:20:55