简体   繁体   English

数据框列差异

[英]Data frame columns Difference

I have a data frame with 3 column FIRST, SECOND,THIRD.我有一个包含 3 列 FIRST、SECOND、THIRD 的数据框。 i want to find the difference of first column and combination of SECOND and THIRD.我想找出第一列的区别以及第二列和第三列的组合。 and create the values in Final column.并在最终列中创建值。

DF1= DF1=

   FIRST                         SECOND       THIRD

  "NEWYORK" IS NICE CITY             NICE      IS CITY
  "LONDON" WINTER MUCHBETTER         LONDON     WINTER                               
  "CANADA" IS EVEN MORECOLDER        CANADA     IS EVEN                          
  "PARIS" IS  MOREBEAUTIFUL      MOREBEAUTIFUL  IS   

i want my Data frame to to look like this:我希望我的数据框看起来像这样:

DF1= DF1=

     FIRST                         SECOND       THIRD                            FINAL
  "NEWYORK" IS NICE CITY             NICE       IS CITY                            NEWYORK
  "LONDON" WINTER MUCHBETTER         LONDON     WINTER                             MUCHBETTER
  "CANADA" IS EVEN MORECOLDER        CANADA     IS EVEN                           MORECOLDER 
  "PARIS" IS  MOREBEAUTIFUL      MOREBEAUTIFUL    IS                                  PARIS

You can do this by:您可以通过以下方式做到这一点:

import pandas as pd

# Creating the dataframe
columns = ['FIRST','SECOND','THIRD']
data = [
    ['NEWYORK IS NICE CITY', 'NICE', 'IS CITY'],
    ['LONDON WINTER MUCHBETTER', 'LONDON', 'WINTER'],
    ['CANADA IS EVEN MORECOLDER', 'CANADA', 'IS EVEN']
]

df = pd.DataFrame(data=data,columns=columns)

# Creating FINAL column
df['FINAL'] = df[['SECOND', 'THIRD']].agg(' '.join, axis=1).str.split(' ')
df['FINAL'] = df.apply(lambda x: ''.join(set(x['FIRST'].split(' ')) - set(x['FINAL'])), axis=1)

Here what is happening is that you are first creating the FINAL column joining the SECOND and THIRD columns.这里发生的事情是您首先创建连接第二列和第三列的最终列。 Then you create a set from the FIRST column splitted by space and subtract a set created from the recently created FINAL column, which will return a set with the only word contained in FIRST but not in SECOND or THIRD.然后,您从按空格分隔的 FIRST 列创建一个集合,并减去从最近创建的 FINAL 列创建的集合,这将返回一个集合,其中唯一的单词包含在 FIRST 中,但不包含在 SECOND 或 THIRD 中。

Then all you have to do is get a string from the set, using join.然后你所要做的就是使用 join 从集合中获取一个字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM