How to populate a new column in an existing pandas dataframe

Question

I have a pandas dataframe that looks like this:

   X     Y     Z
0  9.5 -2.3   4.13
1  17.5 3.3   0.22
2  NaN  NaN  -5.67
...

I want to add 2 more columns. Is invalid and Is Outlier . Is Invalid will just keep a track of the invalid/NaN values in that given row. So for the 2nd row, Is Invalid will have a value of 2. For rows with valid entries, Is Invalid will display 0.

Is Outlier will just check whether that given row has outlier data. This will just be True/False.

At the moment, this is my code:

dt = np.fromfile(path, dtype='float')
df = pd.DataFrame(dt.reshape(-1, 3), column = ['X', 'Y', 'Z'])

How can I go about adding these features?

Answer 1

x='''Z,Y,X,W,V,U,T
1,2,3,4,5,6,60
17.5,3.3,.22,22.11,-19,44,0
,,-5.67,,,,
'''

import pandas as pd, io, scipy.stats
df = pd.read_csv(io.StringIO(x))
df

Sample input:

      Z    Y     X      W     V     U     T
0   1.0  2.0  3.00   4.00   5.0   6.0  60.0
1  17.5  3.3  0.22  22.11 -19.0  44.0   0.0
2   NaN  NaN -5.67    NaN   NaN   NaN   NaN

Transformations:

df['is_invalid'] = df.isna().sum(axis=1)
df['is_outlier'] = df.iloc[:,:-1].apply(lambda r: (r < (r.quantile(0.25) - 1.5*scipy.stats.iqr(r))) | ( r > (r.quantile(0.75) + 1.5*scipy.stats.iqr(r))) , axis=1).sum(axis = 1)
df

Final output:

      Z    Y     X      W     V     U     T  is_invalid  is_outlier
0   1.0  2.0  3.00   4.00   5.0   6.0  60.0           0           1
1  17.5  3.3  0.22  22.11 -19.0  44.0   0.0           0           0
2   NaN  NaN -5.67    NaN   NaN   NaN   NaN           6           0

Explanation for outlier: Valid range is from Q1-1.5IQR to Q3+1.5IQR Since it needs to calculated per row, we used apply and pass each row (r). To count outliers, we flipped the range ie anything less than Q1-1.5IQR and greater than Q3+1.5IQR is counted.

How to populate a new column in an existing pandas dataframe

Question

1 answers

solution1
0 2022-09-01 04:51:59

How to populate a new column in an existing pandas dataframe

Question

1 answers

solution1 0 2022-09-01 04:51:59

solution1
0 2022-09-01 04:51:59