[英]How to separate a data frame based on a column's range of values with pandas?
[英]Pandas: Calculating a value in a separate data frame column frame based on range of values in another data frame column (python)
我正在使用 python 3.9,我正在嘗試根據另一列中的值范圍計算另一個 dataframe 列中的 output 值。
例如,在df['a']
中,我有 0 到 50 之間的整數,沒有特別的順序。
我正在嘗試根據if
語句在同一個 dataframe 中創建另一個名為 df['output_column'] 的列。
import pandas as pd
import numpy as np
p = 'a'
if df[p] in range(0, 7):
df['output_column'] = 95
elif df[p] in range(8, 14):
df['output_column'] = 90
elif df[p] in range(15, 21):
df['output_column'] = 85
elif df[p] in range(22, 28):
df['output_column'] = 80
else:
df['output_column'] = 75
但是,我收到以下錯誤:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [18], in <module>
1 p = 'a'
----> 3 if df[p] in range(0, 7):
4 df['output_column'] = 95
5 elif df[p] in range(8, 14):
File ~\path_to_pandas\pandas\core\generic.py:1535, in NDFrame.__nonzero__(self)
1533 @final
1534 def __nonzero__(self):
-> 1535 raise ValueError(
1536 f"The truth value of a {type(self).__name__} is ambiguous. "
1537 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1538 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我該如何糾正這個問題?
您可以使用pd.cut
來執行此操作:
df['output'] = pd.cut(df[p],
bins=[-np.inf,8,15,22,29,np.inf],
labels=[95,90,85,80,75]).astype(int)
您可以使用.bewteen() 設置范圍,然后使用 np.select() 填充新的 output_column。
import pandas as pd
import numpy as np
ranges = [df['a'].between(0, 6),
df['a'].between(7, 13), df['a'].between(14, 20),
df['a'].between(21, 27), df['a'].between(28, 999)]
values = [95,90, 85, 80, 75]
df['output_column'] = np.select(ranges, values)
df["output_column"] = 95
df.loc[df[p]>=8, "output_column"] = 90
df.loc[df[p]>=15, "output_column"] = 85
df.loc[df[p]>=22, "output_column"] = 80
df.loc[df[p]>=29, "output_column"] = 75
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.