简体   繁体   English

将单词(仅限字母)和包含数字的单词提取到单独的 dataframe 列中

[英]Extract words (letters only) and words containing numbers into separate dataframe columns

I'm trying to extract words which contain letter only into a new column, and any word which contains a number into a different column.我正在尝试将仅包含字母的单词提取到新列中,并将包含数字的任何单词提取到不同的列中。

Desired Output:所需 Output:

              query    words_only contains_number
0   Nike Air Max 97  Nike Air Max              97
1     Adidas NMD-R1        Adidas          NMD-R1
2  Nike Air Max 270  Nike Air Max             270

What I've Tried:我试过的:

I've seen the answer here which gets me some of the way there, but it's not exactly what I need.我在这里看到了答案,这让我有些路要走,但这并不是我所需要的。

How to extract words containing only letters from a text in python? 如何从 python 中的文本中提取仅包含字母的单词?

Minimum Reproducible Example:最小可重现示例:

# Import pandas library
import pandas as pd

# initialize list elements
data = ["Nike Air Max 97", "Adidas NMD R1", "Nike Air Max 270"]

# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['query'])

# print dataframe.
print(df)

You can use a regex with str.extractall to extract the words with and without digits separately, then groupby.agg to join them separately:您可以使用带有str.extractall的正则表达式分别提取带数字和不带数字的单词,然后groupby.agg分别加入它们:

df[['words_only', 'contains_number']] = (df['query']
 .str.extractall(r'(\S*\d\S*)|([^\s\d]+)') # order is important
 .groupby(level=0).agg(lambda s: ' '.join(s.dropna()))
 .loc[:, ::-1] # invert 2 columns
)

Output: Output:

              query    words_only contains_number
0   Nike Air Max 97  Nike Air Max              97
1     Adidas NMD-R1        Adidas          NMD-R1
2  Nike Air Max 270  Nike Air Max             270

regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM