简体   繁体   中英

Extract words (letters only) and words containing numbers into separate dataframe columns

I'm trying to extract words which contain letter only into a new column, and any word which contains a number into a different column.

Desired Output:

              query    words_only contains_number
0   Nike Air Max 97  Nike Air Max              97
1     Adidas NMD-R1        Adidas          NMD-R1
2  Nike Air Max 270  Nike Air Max             270

What I've Tried:

I've seen the answer here which gets me some of the way there, but it's not exactly what I need.

How to extract words containing only letters from a text in python?

Minimum Reproducible Example:

# Import pandas library
import pandas as pd

# initialize list elements
data = ["Nike Air Max 97", "Adidas NMD R1", "Nike Air Max 270"]

# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['query'])

# print dataframe.
print(df)

You can use a regex with str.extractall to extract the words with and without digits separately, then groupby.agg to join them separately:

df[['words_only', 'contains_number']] = (df['query']
 .str.extractall(r'(\S*\d\S*)|([^\s\d]+)') # order is important
 .groupby(level=0).agg(lambda s: ' '.join(s.dropna()))
 .loc[:, ::-1] # invert 2 columns
)

Output:

              query    words_only contains_number
0   Nike Air Max 97  Nike Air Max              97
1     Adidas NMD-R1        Adidas          NMD-R1
2  Nike Air Max 270  Nike Air Max             270

regex demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM