[英]Comparison of DataFrame columns and adding two more columns to DataFrame, based on comparison in Python Pandas
我有一個像這樣的 DataFrame:
category uid sales_1 sales_2
0 Grocery 1 XX XX
1 Grocery 2 XX ZZ
2 Sports 3 XX ZZ
3 Grocery 4 ZZ XX
4 Beauty 5 ZZ ZZ
5 Beauty 6 ZZ ZZ
6 Sports 7 ZZ XX
7 Grocery 8 ZZ XX
...
我需要將 sales_1 列與 sales_2 列進行比較。 比較結果將反映在第一和第二個新列中。 如果 sales_1 == sales_2 則這 2 個新列中的值應為“無更改”和“確定”。 如果 sales_1.= sales_2 的值應該是“改變”和“差距”:最后我想要一個以下數據框:
category uid sales_1 sales_2 first second
0 Grocery 1 XX XX no changes OK
1 Grocery 2 XX ZZ changed gap
2 Sports 3 XX ZZ changed gap
3 Grocery 4 ZZ XX changed gap
4 Beauty 5 ZZ ZZ no changes OK
5 Beauty 6 ZZ ZZ no changes OK
6 Sports 7 ZZ XX changed gap
7 Grocery 8 ZZ XX changed gap
...
我真的很感激任何建議。
您可以使用 numpy 中的where()
function :
df['first'] = np.where(df.sales_1 == df.sales_2, 'no changes', 'changed')
df['second'] = np.where(df.sales_1 == df.sales_2, 'OK', 'gap')
您可以首先為first
列和second
列分配一個默認值,然后根據銷售是否發生變化來應用過濾。
import pandas as pd
df = pd.DataFrame(
{
'category': ['Grocery', 'Sports', 'Beauty'],
'sales_1': ['XX', 'ZZ', 'XX'],
'sales_2': ['XX', 'XY', 'ZZ'],
}
)
changed_sales = df['sales_1'] != df['sales_2']
df['first'] = 'no changes'
df.loc[changed_sales, 'first'] = 'changed'
df['second'] = 'OK'
df.loc[changed_sales, 'second'] = 'gap'
print(df)
Output
category sales_1 sales_2 first second
0 Grocery XX XX no changes OK
1 Sports ZZ XY changed gap
2 Beauty XX ZZ changed gap
你可以使用列表理解
df['first']= ["no changes" if s1 == s2 else "changed" for (s1, s2) in zip(df['sales_1'], df['sales_2']) ]
df['second'] = ["OK" if s1 == s2 else "gap" for (s1, s2) in zip(df['sales_1'], df['sales_2']) ]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.