Python pandas - 基于其他列的新列（字符串）

Question

I couldn't find it in stackoverflow, so I wanted to ask the question.我在stackoverflow中找不到它，所以我想问这个问题。

Let's assume that i have two columns: A, B in data frame, which consist of just a bunch of words, and i want to create a new column C which is just TRUE/FALSE based on the following rule:假设我有两列：数据框中的 A、B 仅由一堆单词组成，并且我想创建一个新列 C，根据以下规则它只是 TRUE/FALSE：

 If word in B = word in A + 'ing', then it's True or vice versa
 If word in B = word in A + 'ment', then it's True of vice versa.

so I defined the following function:所以我定义了以下function：

def parts_of_speech(s1, s2):
    return s1+'ing'==s2 or s1+'ment'==s2 or s1+s1[-1]+'ing'==s2

For instance例如

  A              B            C
Engage         Engagement   True
Go             Going        True
Axe            Axis         False
Management     Manage       True

I tried the following:我尝试了以下方法：

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or 
                           parts_of_speech(x.B, x.A) )

or或者

df['C']=df.apply(parts_of_speech(df['A'], df['B']) or 
                           parts_of_speech(df['A'], df['B']) )

I get the same error:我犯了同样的错误：

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't know what i did incorrectly.我不知道我做错了什么。 Is there an easy fix for this?有一个简单的解决方法吗？

any help would be greatly appreciated.任何帮助将不胜感激。

Answer 1

.apply works with columns by default. .apply 默认使用列。 The only change needed in your example is to add axis=1 to apply to rows:您的示例中需要的唯一更改是添加axis=1以应用于行：

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or parts_of_speech(x.B, x.A),
                 axis=1)

Answer 2

For your sample data:对于您的示例数据：

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

# split by suffixes
df['B'].str.extract('(\w+)(ment|ing)$',expand=True)[0].eq(df['A'])

Or use your approach, but vectorized:或使用您的方法，但矢量化：

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

df['A-ing'] = df['A'] + 'ing'
df['A-ment'] = df['A'] + 'ment'

df.iloc[:,-2].eq(df['A']).all(1)

Output: Output：

0     True
1     True
2    False
3     True
dtype: bool

Python pandas - 基于其他列的新列（字符串）

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-09-19 17:41:53

解决方案2
1 2019-09-19 17:45:03

Python pandas - 基于其他列的新列（字符串）

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-09-19 17:41:53

解决方案2 1 2019-09-19 17:45:03

解决方案1
2 已采纳 2019-09-19 17:41:53

解决方案2
1 2019-09-19 17:45:03