简体   繁体   English

Python pandas - 基于其他列的新列(字符串)

[英]Python pandas - new column based on other columns (String)

I couldn't find it in stackoverflow, so I wanted to ask the question.我在stackoverflow中找不到它,所以我想问这个问题。

Let's assume that i have two columns: A, B in data frame, which consist of just a bunch of words, and i want to create a new column C which is just TRUE/FALSE based on the following rule:假设我有两列:数据框中的 A、B 仅由一堆单词组成,并且我想创建一个新列 C,根据以下规则它只是 TRUE/FALSE:

 If word in B = word in A + 'ing', then it's True or vice versa
 If word in B = word in A + 'ment', then it's True of vice versa. 

so I defined the following function:所以我定义了以下function:

def parts_of_speech(s1, s2):
    return s1+'ing'==s2 or s1+'ment'==s2 or s1+s1[-1]+'ing'==s2

For instance例如

  A              B            C
Engage         Engagement   True
Go             Going        True
Axe            Axis         False
Management     Manage       True

I tried the following:我尝试了以下方法:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or 
                           parts_of_speech(x.B, x.A) )

or或者

df['C']=df.apply(parts_of_speech(df['A'], df['B']) or 
                           parts_of_speech(df['A'], df['B']) )

I get the same error:我犯了同样的错误:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't know what i did incorrectly.我不知道我做错了什么。 Is there an easy fix for this?有一个简单的解决方法吗?

any help would be greatly appreciated.任何帮助将不胜感激。

.apply works with columns by default. .apply 默认使用列。 The only change needed in your example is to add axis=1 to apply to rows:您的示例中需要的唯一更改是添加axis=1以应用于行:

df['C']=df.apply(lambda x: parts_of_speech(x.A, x.B) or parts_of_speech(x.B, x.A),
                 axis=1)

For your sample data:对于您的示例数据:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

# split by suffixes
df['B'].str.extract('(\w+)(ment|ing)$',expand=True)[0].eq(df['A'])

Or use your approach, but vectorized:或使用您的方法,但矢量化:

# make B the longer words
df[['A','B']] = np.sort(df[['A','B']])

df['A-ing'] = df['A'] + 'ing'
df['A-ment'] = df['A'] + 'ment'

df.iloc[:,-2].eq(df['A']).all(1)

Output: Output:

0     True
1     True
2    False
3     True
dtype: bool

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于python pandas中其他列的值创建新列 - Creating a new column based on values from other columns in python pandas Python: pandas DataFrame 基于其他列的新列 - Python: pandas DataFrame new column based on other columns python pandas基于其他列条件的新列分类 - python pandas new column categorization based on conditions in other columns Python Pandas 基于其他列值的新列 - Python Pandas New Column based on values from other columns 如何根据pandas中其他列的值计算新列 - python - how to compute a new column based on the values of other columns in pandas - python 如何根据其他 pandas 列和关联的字符串列的最大值创建新的 pandas 列? - How can I create a new pandas column based on the max value of other pandas columns and the associated string column? Pandas/Python:如何根据其他列的值创建新列并将额外条件应用于此新列 - Pandas/Python: How to create new column based on values from other columns and apply extra condition to this new column 基于 Python 中的其他列分配新列 - Assigning new column based on other columns in Python 在其他列中基于NaN的Python新列 - Python new column based on NaN in other columns Pandas - 根据其他列对列进行分组并将其标记为新列 - Pandas - Grouping columns based on other columns and tagging them into new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM