[英]How to populate a new column in an existing pandas dataframe
I have a pandas dataframe that looks like this:我有一个 pandas dataframe 看起来像这样:
X Y Z
0 9.5 -2.3 4.13
1 17.5 3.3 0.22
2 NaN NaN -5.67
...
I want to add 2 more columns.我想再添加 2 列。 Is invalid
and Is Outlier
. Is invalid
并且Is Outlier
。 Is Invalid
will just keep a track of the invalid/NaN values in that given row. Is Invalid
将只跟踪该给定行中的无效/NaN 值。 So for the 2nd row, Is Invalid
will have a value of 2. For rows with valid entries, Is Invalid will display 0.因此,对于第 2 行, Is Invalid
的值为 2。对于具有有效条目的行,Is Invalid 将显示 0。
Is Outlier
will just check whether that given row has outlier data. Is Outlier
将只检查给定行是否有异常数据。 This will just be True/False.这只是真/假。
At the moment, this is my code:目前,这是我的代码:
dt = np.fromfile(path, dtype='float')
df = pd.DataFrame(dt.reshape(-1, 3), column = ['X', 'Y', 'Z'])
How can I go about adding these features?我如何 go 关于添加这些功能?
x='''Z,Y,X,W,V,U,T
1,2,3,4,5,6,60
17.5,3.3,.22,22.11,-19,44,0
,,-5.67,,,,
'''
import pandas as pd, io, scipy.stats
df = pd.read_csv(io.StringIO(x))
df
Sample input:样本输入:
Z Y X W V U T
0 1.0 2.0 3.00 4.00 5.0 6.0 60.0
1 17.5 3.3 0.22 22.11 -19.0 44.0 0.0
2 NaN NaN -5.67 NaN NaN NaN NaN
Transformations:转换:
df['is_invalid'] = df.isna().sum(axis=1)
df['is_outlier'] = df.iloc[:,:-1].apply(lambda r: (r < (r.quantile(0.25) - 1.5*scipy.stats.iqr(r))) | ( r > (r.quantile(0.75) + 1.5*scipy.stats.iqr(r))) , axis=1).sum(axis = 1)
df
Final output:最终 output:
Z Y X W V U T is_invalid is_outlier
0 1.0 2.0 3.00 4.00 5.0 6.0 60.0 0 1
1 17.5 3.3 0.22 22.11 -19.0 44.0 0.0 0 0
2 NaN NaN -5.67 NaN NaN NaN NaN 6 0
Explanation for outlier: Valid range is from Q1-1.5IQR to Q3+1.5IQR Since it needs to calculated per row, we used apply and pass each row (r).异常值说明:有效范围从 Q1-1.5IQR 到 Q3+1.5IQR 由于需要逐行计算,我们使用 apply 并传递每一行 (r)。 To count outliers, we flipped the range ie anything less than Q1-1.5IQR and greater than Q3+1.5IQR is counted.为了计算异常值,我们翻转了范围,即计算小于 Q1-1.5IQR 和大于 Q3+1.5IQR 的任何值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.