[英]Pandas: Selecting and modifying dataframe based on even more complex criteria
I was looking at this and this threads, and though my question is not so different, it has a few differences. 我正在看这个和这个线程,虽然我的问题没有那么不同,但它有一些差异。 I have a dataframe full of
floats
, that I want to replace by strings. 我有一个充满
floats
的数据帧,我想用字符串替换。 Say: 说:
A B C
A 0 1.5 13
B 0.5 100.2 7.3
C 1.3 34 0.01
To this table I want to replace by several criteria, but only the first replacement works: 对于这个表我想用几个标准替换,但只有第一个替换工作:
df[df<1]='N' # Works
df[(df>1)&(df<10)]#='L' # Doesn't work
df[(df>10)&(df<50)]='M' # Doesn't work
df[df>50]='H' # Doesn't work
If I instead do the selection for the 2nd line based on float
, still doesn't work: 如果我改为基于
float
进行第二行的选择,仍然不起作用:
((df.applymap(type)==float) & (df<10) & (df>1)) #Doesn't work
I was wondering how to apply pd.DataFrame().mask
in here, or any other way. 我想知道如何在这里或任何其他方式应用
pd.DataFrame().mask
。 How should I solve this? 我该怎么解决这个问题?
Alternatively, I know I may read column by column and apply the substitutions on each series, but this seems a bit counter productive 或者,我知道我可以逐列阅读并在每个系列中应用替换,但这似乎有点适得其反
Edit: Could anyone explain why the 4 simple assignments above do not work? 编辑:任何人都可以解释为什么上面的4个简单分配不起作用?
Use numpy.select
with DataFrame
constructor: 将
numpy.select
与DataFrame
构造函数一起使用:
m1 = df < 1
m2 = (df>1)&(df<10)
m3 = (df>10)&(df<50)
m4 = df>5
vals = list('NLMH')
df = pd.DataFrame(np.select([m1,m2,m3,m4], vals), index=df.index, columns=df.columns)
print (df)
A B C
A N L M
B N H L
C L M N
By using pd.cut
通过使用
pd.cut
pd.cut(df.stack(),[-1,1,10,50,np.inf],labels=list('NLMH')).unstack()
Out[309]:
A B C
A N L M
B N H L
C L M N
You can use searchsorted
您可以使用
searchsorted
labels = np.array(list('NLMH'))
breaks = np.array([1, 10, 50])
pd.DataFrame(
labels[breaks.searchsorted(df.values)].reshape(df.shape),
df.index, df.columns)
A B C
A N L M
B N H L
C L M N
labels = np.array(list('NLMH'))
breaks = np.array([1, 10, 50])
df[:] = labels[breaks.searchsorted(df.values)].reshape(df.shape)
df
A B C
A N L M
B N H L
C L M N
pandas.DataFrame.mask
pandas.DataFrame.mask
方法与pandas.DataFrame.mask
Deprecated since version 0.21 从版本0.21开始不推荐使用
df.mask(df.lt(1), 'N').mask(df.gt(1) & df.lt(10), 'L') \
.mask(df.gt(10) & df.lt(50), 'M').mask(df.gt(50), 'H')
A B C
A N L M
B N H L
C L M N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.