简体   繁体   中英

Python pandas - new column based on other columns (String)

I couldn't find it in stackoverflow, so I wanted to ask the question.

Let's assume that i have two columns: A, B in data frame, which consist of just a bunch of words, and i want to create a new column C which is just TRUE/FALSE based on the following rule:

 If word in B = word in A + 'ing', then it's True or vice versa
 If word in B = word in A + 'ment', then it's True of vice versa. 

so I defined the following function:

def parts_of_speech(s1, s2):
    return s1+'ing'==s2 or s1+'ment'==s2 or s1+s1[-1]+'ing'==s2

For instance

  A              B            C
Engage         Engagement   True
Go             Going        True
Axe            Axis         False
Management     Manage       True

I tried the following:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or 
                           parts_of_speech(x.B, x.A) )

or

df['C']=df.apply(parts_of_speech(df['A'], df['B']) or 
                           parts_of_speech(df['A'], df['B']) )

I get the same error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't know what i did incorrectly. Is there an easy fix for this?

any help would be greatly appreciated.

.apply works with columns by default. The only change needed in your example is to add axis=1 to apply to rows:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or parts_of_speech(x.B, x.A),
                 axis=1)

For your sample data:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

# split by suffixes
df['B'].str.extract('(\w+)(ment|ing)$',expand=True)[0].eq(df['A'])

Or use your approach, but vectorized:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

df['A-ing'] = df['A'] + 'ing'
df['A-ment'] = df['A'] + 'ment'

df.iloc[:,-2].eq(df['A']).all(1)

Output:

0     True
1     True
2    False
3     True
dtype: bool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM