简体   繁体   English

比较数据框的两个字符串列,其值为“PO”、“GO”等,并创建值为“High”、“Low”和“No Change”的第三列

[英]Comparing two string columns having values as "PO","GO" etc of a dataframe and creating a third column having values as "High","Low" and "No Change"

I have two columns in a dataframe.我在数据框中有两列。 Column one is named as previous_code and column two is named as New_code.These columns have values as "PO","GO","RO" etc. These codes have priority for example "PO" has higher Priority compared to "GO".I want to compare values of these two columns and Put the output in new column as "High","Low" and "No Change" incase both the columns have same code.第一列命名为previous_code,第二列命名为New_code。这些列的值为“PO”、“GO”、“RO”等。这些代码具有优先级,例如“PO”与“GO”相比具有更高的优先级。我想比较这两列的值并将输出放在新列中为“高”、“低”和“无变化”,以防两列具有相同的代码。 Below is the example of how dataframe looks like以下是数据框外观的示例

CustID|previous_code|New_code
345.    | PO.                   | GO
367.    | RO.                   | PO
385.    |PO.                    | RO
455.    |GO.                    |GO

Expected output Dataframe预期输出数据帧

CustID|previous_code|New_code|Change

345.    | PO.                   | GO.            | Low
367.    | RO.                   | PO.            |High
385.    |PO.                    | RO.            |Low
455.    |GO.                    |GO.             |No Change

If someone could write a demo code for this in pyspark or Pandasthat will be helpful.如果有人可以在 pyspark 或 Pandas 中为此编写演示代码,那将会很有帮助。

Thanks in advance.提前致谢。

If I understood the ordering correctly, this should work fine:如果我正确理解了顺序,这应该可以正常工作:

import pandas as pd
import numpy as np
data = {'CustID':[345,367,385,455],'previous_code':['PO','RO','PO','GO'],'New_code':['GO','PO','RO','GO']}
df = pd.DataFrame(data)
mapping = {'PO':1,'GO':2,'RO':3}
df['previous_aux'] = df['previous_code'].map(mapping)
df['new_aux'] = df['New_code'].map(mapping)
df['output'] = np.where(df['previous_aux'] == df['new_aux'],'No change',np.where(df['previous_aux'] > df['new_aux'],'High','Low'))
df = df[['CustID','previous_code','New_code','output']]
print(df)

Output:输出:

   CustID previous_code New_code     output
0     345            PO       GO        Low
1     367            RO       PO       High
2     385            PO       RO        Low
3     455            GO       GO  No change

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过比较 dataframe A 和 B 的两列,创建列值为 dataframe A 的第三列 - Create a third column with column values of dataframe A by comparing two columns of dataframe A and B 绘制具有重复值的两列的数据框 - Plotting of Dataframe with two columns having repetitive values 比较两列中的值并提取 dataframe 中第三列的值 - Compare the values in two columns and extract the values of a third column in a dataframe 如何合并具有日期/时间格式的两个数据框列,并使用第二列中的值更新表 - How to merge two dataframe columns having date/time format and update the table with values in second column 有没有办法将具有列表值的两列组合成一列,其中包含 pyspark dataframe 的列表值 - Is there a way to combine two columns having list values into one column with list value for a pyspark dataframe 比较 2 个 pandas 数据框列并根据值是否相同创建新列 - Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not 在python中,如何减去具有时间值的两列excel文件并为其创建新列? - In python, how to subtract two columns of excel file having time values and creating new column of it? 如果两列在 pandas Dataframe 中具有相似的值,如何写“I”? - How to write 'I' if two columns are having similar values in pandas Dataframe? 将来自一个数据帧的值与来自另一个数据帧中的列的值进行比较,并从第三列获取数据 - Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column DataFrame如何从具有两列(“起始”日期时间和“至”日期时间)更改为仅具有一列日期? - How can a DataFrame change from having two columns (a “from” datetime and a “to” datetime) to having a single column for a date?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM