使用数据框另一部分中的数据编辑熊猫数据框中的值

Question

I was hoping to use indexing on one part of a Pandas DataFrame to edit values corresponding to another index. 我希望在Pandas DataFrame的一部分上使用索引来编辑与另一索引对应的值。 Here is an example: 这是一个例子：

>>> from pandas import *
>>> from numpy.random import randn
>>> x = DataFrame(randn(3, 3), columns=[1, 2, 3], index=['a', 'b', 'c'])
>>> print x

        1         2         3
a -1.007344  0.234990  0.772736
b  0.658360  1.330051 -0.269388
c  0.010871  1.035687  0.230169


>>> index1 = x.index[0:2]
>>> index2 = x.index[1:3]
>>> y = x
>>> x.loc[index1, 3] = x.loc[index2, 2]
>>> print x


        1         2         3
a -1.007344  0.234990       NaN
b  0.658360  1.330051  1.330051
c  0.010871  1.035687  0.230169

Where the latter output is rather unexpected. 后者的输出相当意外。 What does work instead is the following: 起作用的是以下内容：

>>> y.loc[index1, 3] = y.loc[index2, 2].values
>>> print y

       1         2         3
a -1.007344  0.234990  1.330051
b  0.658360  1.330051  1.035687
c  0.010871  1.035687  0.230169

However, this latter solution is inconvenient for a number of applications I would like to use. 但是，对于我想使用的许多应用程序，后一种解决方案不方便。 For example, I would like to write: 例如，我想写：

x.loc[index1, 3] = x.loc[index2, 2]+2

or 要么

x.loc[index1, 3] = x.loc[index1, 3] + x.loc[index2, 2]

etc. 等等

Is there another way around this problem? 是否有其他方法可以解决此问题？

Thanks in advance! 提前致谢！

Answer 1

Pandas is great for aligning based on index. 熊猫非常适合根据索引进行对齐。 The "unexpected" result is actually understandable if you think of 如果您想到“意外”的结果，实际上是可以理解的

x.loc[index1, 3]

as a Series with index labels ['a', 'b'] and assignment 带有索引标签['a', 'b']和分配的系列

x.loc[index1, 3] = x.loc[index2, 2]

is assigning new values from x.loc[index2, 2] which is a Series with index labels ['b', 'c'] . 从x.loc[index2, 2]分配新值， x.loc[index2, 2]是具有索引标签['b', 'c']的Series。 Since the data on the right-hand side only aligns with the Series on the left at the label 'b' , that label gets a new value, while the label a is set to NaN , since the right-hand side has no value for that index. 由于右侧的数据仅与标签'b'上的左侧“系列” 对齐，因此标签a设置为NaN ，该标签将获得新值，因为右侧没有用于该索引。

When you want Pandas to disregard the index, you need to pass an object on the right-hand side that has no index. 当您希望熊猫忽略索引时，您需要在右侧传递没有索引的对象。 So, as you showed, 因此，正如您所展示的，

y.loc[index1, 3] = y.loc[index2, 2].values

produces the desired result. 产生期望的结果。

Similarly, for your more complicated assignments, you could use 同样，对于更复杂的任务，您可以使用

x.loc[index1, 3] = x.loc[index2, 2].values + 2

or 要么

x.loc[index1, 3] += x.loc[index2, 2].values

(Note the second assignment uses the in-place addition operator, += .) （请注意，第二个分配使用就地加法运算符+= 。）

If you have a lot of assignments that ignores the index, then perhaps you should be using a NumPy array instead of a Pandas DataFrame. 如果您有很多忽略索引的分配，那么也许您应该使用NumPy数组而不是Pandas DataFrame。

import pandas as pd
import numpy as np

x = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=[1, 2, 3], index=['a', 'b', 'c'])
arr = x.values
print(arr)
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]

index1 = slice(0,2)
index2 = slice(1,3)
arr[index1, 2] = arr[index2, 1]
print(arr)
# [[0 1 4]
#  [3 4 7]
#  [6 7 8]]

# Instead of x.loc[index1, 3] = x.loc[index2, 2]+2 
arr[index1, 2] = arr[index2, 1] + 2
print(arr)
# [[0 1 6]
#  [3 4 9]
#  [6 7 8]]

# Instead of x.loc[index1, 3] = x.loc[index1, 3] + x.loc[index2, 2]
arr[index1, 2] += arr[index2, 1]
print(arr)
# [[ 0  1 10]
#  [ 3  4 16]
#  [ 6  7  8]]

x.loc[:,:] = arr
print(x)
#    1  2   3
# a  0  1  10
# b  3  4  16
# c  6  7   8

使用数据框另一部分中的数据编辑熊猫数据框中的值

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-07-14 20:51:13

使用数据框另一部分中的数据编辑熊猫数据框中的值

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-07-14 20:51:13

解决方案1
1 已采纳 2014-07-14 20:51:13