[英]Comparing two string columns having values as "PO","GO" etc of a dataframe and creating a third column having values as "High","Low" and "No Change"
I have two columns in a dataframe.我在数据框中有两列。 Column one is named as previous_code and column two is named as New_code.These columns have values as "PO","GO","RO" etc. These codes have priority for example "PO" has higher Priority compared to "GO".I want to compare values of these two columns and Put the output in new column as "High","Low" and "No Change" incase both the columns have same code.
第一列命名为previous_code,第二列命名为New_code。这些列的值为“PO”、“GO”、“RO”等。这些代码具有优先级,例如“PO”与“GO”相比具有更高的优先级。我想比较这两列的值并将输出放在新列中为“高”、“低”和“无变化”,以防两列具有相同的代码。 Below is the example of how dataframe looks like
以下是数据框外观的示例
CustID|previous_code|New_code
345. | PO. | GO
367. | RO. | PO
385. |PO. | RO
455. |GO. |GO
Expected output Dataframe预期输出数据帧
CustID|previous_code|New_code|Change
345. | PO. | GO. | Low
367. | RO. | PO. |High
385. |PO. | RO. |Low
455. |GO. |GO. |No Change
If someone could write a demo code for this in pyspark or Pandasthat will be helpful.如果有人可以在 pyspark 或 Pandas 中为此编写演示代码,那将会很有帮助。
Thanks in advance.提前致谢。
If I understood the ordering correctly, this should work fine:如果我正确理解了顺序,这应该可以正常工作:
import pandas as pd
import numpy as np
data = {'CustID':[345,367,385,455],'previous_code':['PO','RO','PO','GO'],'New_code':['GO','PO','RO','GO']}
df = pd.DataFrame(data)
mapping = {'PO':1,'GO':2,'RO':3}
df['previous_aux'] = df['previous_code'].map(mapping)
df['new_aux'] = df['New_code'].map(mapping)
df['output'] = np.where(df['previous_aux'] == df['new_aux'],'No change',np.where(df['previous_aux'] > df['new_aux'],'High','Low'))
df = df[['CustID','previous_code','New_code','output']]
print(df)
Output:输出:
CustID previous_code New_code output
0 345 PO GO Low
1 367 RO PO High
2 385 PO RO Low
3 455 GO GO No change
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.