简体   繁体   English

Pandas 比较同一数据框中两列中的字符串,并有条件地输出到新列

[英]Pandas compare strings in two columns within the same dataframe with conditional output to new column

I have two columns within a data frame containing strings.我在包含字符串的数据框中有两列。 For example,例如,

import pandas as pd
import numpy as np

data = [['Oct-2019', 'Oranges + Grapes + Pears', 'Grapes + Pears'],
       ['Nov-2019', 'Oranges + Grapes + Pears', 'Oranges + Grapes + Pears']]

df = pd.DataFrame(data, columns =['Date', 'Previous shopping list', 'Recent shopping list'])
print(df)

Fish = ['Salmon', 'Trout']
Fruit = ['Oranges', 'Grapes', 'Pears']

     Date     PSL                 RSL
0  Oct-2019   Oranges + Grapes    Grapes + Pears
              + Pears + Salmon                     

1  Nov-2019   Oranges + Grapes    Oranges + Grapes
              + Pears + Trout     + Pears  

I want to compare the strings in both columns and have a text output to a new column that says what has changed between the two lists.我想比较两列中的字符串,并有一个文本输出到一个新列,说明两个列表之间发生了什么变化。 Such as, creating a column that will check for the strings related to "Fruit" and output what fruit has been dropped from the recent shopping when compared to the previous list previous shopping list.例如,创建一个列来检查与“水果”相关的字符串,并输出与上一个列表之前的购物列表相比,最近购物时丢弃了哪些水果。 See Desired output below:请参阅下面的所需输出:

     Date     PSL                 RSL               Fruit lost   Fish Lost
0  Oct-2019   Oranges + Grapes    Grapes + Pears    Oranges      Salmon
              + Pears + Salmon                     

1  Nov-2019   Oranges + Grapes    Oranges + Grapes               Trout
              + Pears + Trout     + Pears  

How would I be able to achieve this in using pandas!我如何能够通过使用熊猫来实现这一目标! Apologies if this was not clear the first time!如果第一次看不清楚,请见谅!

Thank you for any suggestion/help!感谢您的任何建议/帮助!

The exact function that you use to process the data depends on your exact output that you require for each combination.您用于处理数据的确切函数取决于您对每个组合所需的确切输出。 Hopefully below will give you enough to create a solution for your problem:希望以下内容可以为您提供足够的解决方案来解决您的问题:

# process data so each row contains a list of elements
df['PSL_processed'] = df['Previous shopping list'].str.split('+')
df['RSL_processed'] = df['Recent shopping list'].str.split('+')

def compare_items(x):
    if set(x.PSL_processed) == set(x.RSL_processed):
        return 'No change'
    elif set(x.PSL_processed) - set(x.CSL_processed) > 0:
        return 'Lost'
    # add in conditional logic here, to meet specification

df.apply(compare_items, axis=1)

The official documentation for pd.apply() is well written. pd.apply()官方文档pd.apply()很好。

要检查“最近的购物清单”中是否存在字符串“Oranges”并根据结果创建一个新列“Oranges Lost”:

df['Oranges Lost'] = np.where(df['Recent shopping list'].str.contains('Oranges'), 'No Change', 'Lost')``` 

So Mark's solution works well to grab the difference between the lists所以 Mark 的解决方案可以很好地抓住列表之间的差异

# process data so each row contains a list of elements
df['PSL_processed'] = df['Previous shopping list'].str.split()
df['RSL_processed'] = df['Recent shopping list'].str.split()

def compare_items(x):
    return set(x.PSL_processed) - set(x.RSL_processed)
    # add in conditional logic here, to meet specification
df['Products_lost'] = df.apply(compare_items, axis=1)

print(df)

On top to that to find the products that = fruit and the products = fish I used the following:除此之外,为了找到=水果和产品=鱼的产品,我使用了以下内容:

for idx, row in df.iterrows():
    for c in Fruit:
        if c in row['Products_lost']:
            df.ix[idx, 'Fruit lost'] = c
            for c in Fish:
                if c in row['Products_lost']:
                    df.ix[idx, 'Fish lost'] = c

Seems to work well!似乎运作良好!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas dataframe列字符串中的条件替换 - conditional replacement within strings of pandas dataframe column 将 pandas dataframe 的两列与字符串列表进行比较 - compare two columns of pandas dataframe with a list of strings 如何在同一数据框中将 1 列的字符串与另一列的字符串进行比较,计算结果列中字符串匹配的百分比 - How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns 比较相同 dataframe 的两列并返回相同 dataframe 的不同列 - Compare two columns of the same dataframe and returns a different column of the same dataframe 比较两个 dataframe 列是否匹配字符串或者是子字符串,然后计入 pandas - Compare two dataframe columns for matching strings or are substrings then count in pandas 比较两个 python pandas dataframe 字符串列以识别公共字符串并将公共字符串添加到新列 - Compare two python pandas dataframe string columns to identify common string and add the common string to new column 将 Pandas dataframe 分组为两列, output 将最大列值指示到新列 - Group Pandas dataframe by two columns and output the maximum column value indication to new column 熊猫数据框的条件列输出 - conditional column output for pandas dataframe 比较两个 Dataframe 列并在新列中检索不匹配的列 - Compare two Dataframe columns and retrive the mismatched columns in a new column pandas DataFrame中的新列取决于其他列的值 - New column in pandas DataFrame conditional on value of other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM