I have a pandas data frame with a column called warranty. It has record of ways to fix different issues for example. It looks something like the attached picture.
Goal is to find words after the words listed below.
word_list=['replace', 'clean', 'remove']
how can I get this expected output a column added to above df with values replace battery wire clean fuel tank remove nail
pandas
can use regex
to search string and you could use pattern to
(?:replace|clean|remove) (\w+)
You can use python to generate this pattern
words = "|".join(word_list)
pattern = f'(?:{words}) (\w+)'
print('pattern:', pattern)
And later
df['word'] = df['warranty'].str.lower().str.findall(pattern).str[0]
To make sure I convert text to lower()
because pattern uses lower case words.
If replace,clean,remove
is always as first word then you could simply split(" ")
text and get second element:
df['word'] = df['warranty'].str.split(' ').str[1]
If you need more complex code then you could use .apply()
def function(text):
# ... complex code ...
return text.split(' ')[1]
df['word'] = df['warranty'].apply(function)
Minimal working code
import pandas as pd
data = {
'warranty': [
'replace battery wire from car',
'clean fuel tank',
'remove nail from tire',
],
}
word_list=['replace', 'clean', 'remove']
df = pd.DataFrame(data)
words = "|".join(word_list)
pattern = f'(?:{words}) (\w+)'
print('pattern:', pattern)
def function(text):
# ... complex code ...
return text.split(' ')[1]
df['method1'] = df['warranty'].str.lower().str.findall(pattern).str[0]
df['method2'] = df['warranty'].str.split(' ').str[1]
df['method3'] = df['warranty'].apply(function)
print(df)
Result:
pattern: (?:replace|clean|remove) (\w+)
warranty method1 method2 method3
0 replace battery wire from car battery battery battery
1 clean fuel tank fuel fuel fuel
2 remove nail from tire nail nail nail
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.