简体   繁体   English

在 pandas.DataFrame 的对角线上设置值

[英]Set values on the diagonal of pandas.DataFrame

I have a pandas dataframe I would like to se the diagonal to 0我有一个 pandas dataframe 我想将对角线设为 0

import numpy
import pandas

df = pandas.DataFrame(numpy.random.rand(5,5))
df

Out[6]:
     0           1           2           3               4
0    0.536596    0.674319    0.032815    0.908086    0.215334
1    0.735022    0.954506    0.889162    0.711610    0.415118
2    0.119985    0.979056    0.901891    0.687829    0.947549
3    0.186921    0.899178    0.296294    0.521104    0.638924
4    0.354053    0.060022    0.275224    0.635054    0.075738
5 rows × 5 columns

now I want to set the diagonal to 0:现在我想将对角线设置为 0:

for i in range(len(df.index)):
    for j in range(len(df.columns)):
        if i==j:
            df.loc[i,j] = 0
df
Out[9]:
     0           1           2           3           4
0    0.000000    0.674319    0.032815    0.908086    0.215334
1    0.735022    0.000000    0.889162    0.711610    0.415118
2    0.119985    0.979056    0.000000    0.687829    0.947549
3    0.186921    0.899178    0.296294    0.000000    0.638924
4    0.354053    0.060022    0.275224    0.635054    0.000000
5 rows × 5 columns

but there must be a more pythonic way than that??但必须有比这更 pythonic 的方式?

In [21]: df.values[[np.arange(df.shape[0])]*2] = 0

In [22]: df
Out[22]: 
          0         1         2         3         4
0  0.000000  0.931374  0.604412  0.863842  0.280339
1  0.531528  0.000000  0.641094  0.204686  0.997020
2  0.137725  0.037867  0.000000  0.983432  0.458053
3  0.594542  0.943542  0.826738  0.000000  0.753240
4  0.357736  0.689262  0.014773  0.446046  0.000000

Note that this will only work if df has the same number of rows as columns. 请注意,这仅在df与列具有相同行数时才有效。 Another way which will work for arbitrary shapes is to use np.fill_diagonal : 另一种适用于任意形状的方法是使用np.fill_diagonal

In [36]: np.fill_diagonal(df.values, 0)

Both approaches in unutbu's answer assume that labels are irrelevant (they operate on the underlying values). unutbu的答案中的两种方法都假设标签是无关紧要的(它们对基础值进行操作)。

The OP code works with .loc and so is label based instead (ie put a 0 on cells in row-column with same labels, rather than in cells located on the diagonal - admittedly, this is irrelevant in the specific example given, in which labels are just positions). OP代码与.loc ,因此基于标签(即在行列中的单元格上放置0,具有相同的标签,而不是在对角线上的单元格中 - 诚然,这在给定的具体示例中是无关紧要的,其中标签只是位置)。

Being in need of the "label-based" diagonal filling (working with a DataFrame describing an incomplete adjacency matrix), the simplest approach I could come up with was: 需要“基于标签的”对角线填充(使用描述不完整邻接矩阵的DataFrame ),我能想到的最简单的方法是:

def pd_fill_diagonal(df, value):
    idces = df.index.intersection(df.columns)
    stacked = df.stack(dropna=False)
    stacked.update(pd.Series(value,
                             index=pd.MultiIndex.from_arrays([idces,
                                                              idces])))
    df.loc[:, :] = stacked.unstack()

This solution is vectorized and very fast and unless the other suggested solution works for any column names and size of df matrix. 该解决方案是矢量化的并且非常快,除非其他建议的解决方案适用于任何列名称和df矩阵的大小。

def pd_fill_diagonal(df_matrix, value=0): 
    mat = df_matrix.values
    n = mat.shape[0]
    mat[range(n), range(n)] = value
    return pd.DataFrame(mat)

Performance on Dataframe of 507 columns and rows Dataframe上507列和行的性能

% timeit pd_fill_diagonal(df, 0)

1000 loops, best of 3: 145 µs per loop 1000个循环,最佳3:每循环145μs

Here is a hack that worked for me: 这是一个对我有用的黑客:

def set_diag(self, values): 
    n = min(len(self.index), len(self.columns))
    self.values[[np.arange(n)] * 2] = values
pd.DataFrame.set_diag = set_diag

x = pd.DataFrame(np.random.randn(10, 5))
x.set_diag(0)

Using np.fill_diagonal(df.values, 1) Is the easiest, but you need to make sure your columns all have the same data type I had a mixture of np.float64 and python floats and it would only effect the numpy values. 使用np.fill_diagonal(df.values, 1)是最简单的,但你需要确保你的列都具有相同的数据类型我混合了np.float64和python浮点数,它只会影响numpy值。 to fix you have to cast everything to numpy. 修复你必须把一切都变成numpy。

Another way to accomplish this is to get the anti-identity matrix and multiply your dataframe with it.实现此目的的另一种方法是获取反身份矩阵并将 dataframe 与它相乘。

df * abs(np.eye(len(df))-1)

Here is a way with np.identity这是 np.identity 的一种方式

df.where(np.identity(df.shape[0]) != 1,0)

Output: Output:

          0         1         2         3         4
0  0.000000  0.674319  0.032815  0.908086  0.215334
1  0.735022  0.000000  0.889162  0.711610  0.415118
2  0.119985  0.979056  0.000000  0.687829  0.947549
3  0.186921  0.899178  0.296294  0.000000  0.638924
4  0.354053  0.060022  0.275224  0.635054  0.000000

All the answers given which rely on modifying DataFrame.values are depending on undocumented behavior.所有依赖于修改DataFrame.values的答案都取决于未记录的行为。 The values property is allowed to return a copy of data, but the solutions that modify values are assuming it returns a view.允许values属性返回数据的副本,但修改values的解决方案假设它返回一个视图。 Sometimes it does return a view, but the pandas documentation makes no guarantees about when it will.有时它确实会返回视图,但 pandas 文档不保证何时返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM