I am trying to assign value from element in a list, if it startswith
this substring to pandas data frame column
Code:
searchwords = ['harry','harry potter','lotr','secret garden']
l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
'Harry Potter is very famous',
'I enjoyed reading Harry Potter series',
'LOTR is also a great book along',
'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()
Data Preview:
id text
0 1 harry potter is a great book
1 2 harry potter is very famous
2 3 i enjoyed reading harry potter series
3 4 lotr is also a great book along
4 5 have you read secret garden as well?
Tried:
df.loc[df['text'].str.startswith(tuple(searchwords)),'tags'] if (df['text'].str.startswith(tuple(searchwords))) == True else np.NaN
Error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What am I doing wrong? I thought you could assign the value == True
in the if/else logic
Looking for output like this:
id text tags
0 1 harry potter is a great book harry;harry potter
1 2 harry potter is very famous haryy;harry potter
2 3 i enjoyed reading harry potter series NaN
3 4 lotr is also a great book along lotr
4 5 have you read secret garden as well? NaN
Try using apply
:
df['tags'] = df.text.apply(
lambda text: [searchword for searchword in searchwords if text.startswith(searchword)]
)
This gives you the column tags
containing a lists of the respective tags, like so:
If you prefer nan
over empty lists []
, you can do so in a second step.
df['tags'] = df.tags.apply(
lambda current_tag: float('nan') if len(current_tag)==0 else current_tag
)
Here is another version
df["tags"] = df["text"].str.split(" ").apply(lambda x: list(set(x) & set(
searchwords)))
If you want Nan
instead empty list, add following
import numpy as np
df['tags'] = df['tags'].apply(lambda x: np.nan if len(x)==0 else x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.