[英]Extract words (letters only) and words containing numbers into separate dataframe columns
I'm trying to extract words which contain letter only into a new column, and any word which contains a number into a different column.我正在尝试将仅包含字母的单词提取到新列中,并将包含数字的任何单词提取到不同的列中。
Desired Output:所需 Output:
query words_only contains_number
0 Nike Air Max 97 Nike Air Max 97
1 Adidas NMD-R1 Adidas NMD-R1
2 Nike Air Max 270 Nike Air Max 270
What I've Tried:我试过的:
I've seen the answer here which gets me some of the way there, but it's not exactly what I need.我在这里看到了答案,这让我有些路要走,但这并不是我所需要的。
How to extract words containing only letters from a text in python? 如何从 python 中的文本中提取仅包含字母的单词?
Minimum Reproducible Example:最小可重现示例:
# Import pandas library
import pandas as pd
# initialize list elements
data = ["Nike Air Max 97", "Adidas NMD R1", "Nike Air Max 270"]
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['query'])
# print dataframe.
print(df)
You can use a regex with str.extractall
to extract the words with and without digits separately, then groupby.agg
to join them separately:您可以使用带有
str.extractall
的正则表达式分别提取带数字和不带数字的单词,然后groupby.agg
分别加入它们:
df[['words_only', 'contains_number']] = (df['query']
.str.extractall(r'(\S*\d\S*)|([^\s\d]+)') # order is important
.groupby(level=0).agg(lambda s: ' '.join(s.dropna()))
.loc[:, ::-1] # invert 2 columns
)
Output: Output:
query words_only contains_number
0 Nike Air Max 97 Nike Air Max 97
1 Adidas NMD-R1 Adidas NMD-R1
2 Nike Air Max 270 Nike Air Max 270
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.