[英]Replace values on dataset and apply quartile rule by row on pandas
I have a dataset with lots of variables.我有一个包含很多变量的数据集。 So I've extracted the numeric ones:所以我提取了数字:
numeric_columns = transposed_df.select_dtypes(np.number)
Then I want to replace all 0 values for 0.0001然后我想将所有 0 值替换为 0.0001
transposed_df[numeric_columns.columns] = numeric_columns.where(numeric_columns.eq(0, axis=0), 0.0001)
And here is the first problem.这是第一个问题。 This line is not replacing the 0 values with 0.0001, but is replacing all non zero values with 0.0001.此行不是用 0.0001 替换 0 值,而是用 0.0001 替换所有非零值。
Also after this (replacing the 0 values by 0.0001) I want to replace all values there are less than the 1th quartile of the row to -1 and leave the others as they were.同样在此之后(将 0 值替换为 0.0001)我想将所有小于该行的第 1 个四分位数的值替换为 -1 并将其他值保持原样。 But I am not managing how.但我不管理如何。
To answer your first question回答你的第一个问题
In [36]: from pprint import pprint
In [37]: pprint( numeric_columns.where.__doc__)
('\n'
'Replace values where the condition is False.\n'
'\n'
'Parameters\n'
'----------\n'
because of that your all the values except 0 are getting replaced因为你除了 0 之外的所有值都被替换了
Use DataFrame.mask
and for second condition compare by DataFrame.quantile
:使用DataFrame.mask
和第二个条件比较DataFrame.quantile
:
transposed_df = pd.DataFrame({
'A':list('abcdef'),
'B':[0,0.5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,0,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
numeric_columns = transposed_df.select_dtypes(np.number)
m1 = numeric_columns.eq(0)
m2 = numeric_columns.lt(numeric_columns.quantile(q=0.25, axis=1), axis=0)
transposed_df[numeric_columns.columns] = numeric_columns.mask(m1, 0.0001).mask(m2, -1)
print (transposed_df)
A B C D E F
0 a -1.0 7 1.0 5 a
1 b -1.0 8 3.0 3 a
2 c 4.0 9 -1.0 6 a
3 d 5.0 -1 7.0 9 b
4 e 5.0 2 -1.0 2 b
5 f 4.0 3 -1.0 4 b
EDIT:编辑:
from scipy.stats import zscore
print (transposed_df[numeric_columns.columns].apply(zscore))
B C D E
0 -2.236068 0.570352 -0.408248 0.073521
1 0.447214 0.950586 0.408248 -0.808736
2 0.447214 1.330821 -0.816497 0.514650
3 0.447214 -0.570352 2.041241 1.838037
4 0.447214 -1.330821 -0.408248 -1.249865
5 0.447214 -0.950586 -0.816497 -0.367607
EDIT1:编辑1:
transposed_df = pd.DataFrame({
'A':list('abcdef'),
'B':[0,1,1,1,1,1],
'C':[1,8,9,4,2,3],
'D':[1,3,0,7,1,0],
'E':[1,3,6,9,2,4],
'F':list('aaabbb')
})
numeric_columns = transposed_df.select_dtypes(np.number)
from scipy.stats import zscore
df1 = pd.DataFrame(numeric_columns.apply(zscore, axis=1).tolist(),index=transposed_df.index)
transposed_df[numeric_columns.columns] = df1
print (transposed_df)
A B C D E F
0 a -1.732051 0.577350 0.577350 0.577350 a
1 b -1.063410 1.643452 -0.290021 -0.290021 a
2 c -0.816497 1.360828 -1.088662 0.544331 a
3 d -1.402136 -0.412393 0.577350 1.237179 b
4 e -1.000000 1.000000 -1.000000 1.000000 b
5 f -0.632456 0.632456 -1.264911 1.264911 b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.