[英]Editing values in a pandas dataframe using data from another part of the dataframe
I was hoping to use indexing on one part of a Pandas DataFrame to edit values corresponding to another index. 我希望在Pandas DataFrame的一部分上使用索引来编辑与另一索引对应的值。 Here is an example:
这是一个例子:
>>> from pandas import *
>>> from numpy.random import randn
>>> x = DataFrame(randn(3, 3), columns=[1, 2, 3], index=['a', 'b', 'c'])
>>> print x
1 2 3
a -1.007344 0.234990 0.772736
b 0.658360 1.330051 -0.269388
c 0.010871 1.035687 0.230169
>>> index1 = x.index[0:2]
>>> index2 = x.index[1:3]
>>> y = x
>>> x.loc[index1, 3] = x.loc[index2, 2]
>>> print x
1 2 3
a -1.007344 0.234990 NaN
b 0.658360 1.330051 1.330051
c 0.010871 1.035687 0.230169
Where the latter output is rather unexpected. 后者的输出相当意外。 What does work instead is the following:
起作用的是以下内容:
>>> y.loc[index1, 3] = y.loc[index2, 2].values
>>> print y
1 2 3
a -1.007344 0.234990 1.330051
b 0.658360 1.330051 1.035687
c 0.010871 1.035687 0.230169
However, this latter solution is inconvenient for a number of applications I would like to use. 但是,对于我想使用的许多应用程序,后一种解决方案不方便。 For example, I would like to write:
例如,我想写:
x.loc[index1, 3] = x.loc[index2, 2]+2
or 要么
x.loc[index1, 3] = x.loc[index1, 3] + x.loc[index2, 2]
etc. 等等
Is there another way around this problem? 是否有其他方法可以解决此问题?
Thanks in advance! 提前致谢!
Pandas is great for aligning based on index. 熊猫非常适合根据索引进行对齐。 The "unexpected" result is actually understandable if you think of
如果您想到“意外”的结果,实际上是可以理解的
x.loc[index1, 3]
as a Series with index labels ['a', 'b']
and assignment 带有索引标签
['a', 'b']
和分配的系列
x.loc[index1, 3] = x.loc[index2, 2]
is assigning new values from x.loc[index2, 2]
which is a Series with index labels ['b', 'c']
. 从
x.loc[index2, 2]
分配新值, x.loc[index2, 2]
是具有索引标签['b', 'c']
的Series。 Since the data on the right-hand side only aligns with the Series on the left at the label 'b'
, that label gets a new value, while the label a
is set to NaN
, since the right-hand side has no value for that index. 由于右侧的数据仅与标签
'b'
上的左侧“系列” 对齐 ,因此标签a
设置为NaN
,该标签将获得新值,因为右侧没有用于该索引。
When you want Pandas to disregard the index, you need to pass an object on the right-hand side that has no index. 当您希望熊猫忽略索引时,您需要在右侧传递没有索引的对象。 So, as you showed,
因此,正如您所展示的,
y.loc[index1, 3] = y.loc[index2, 2].values
produces the desired result. 产生期望的结果。
Similarly, for your more complicated assignments, you could use 同样,对于更复杂的任务,您可以使用
x.loc[index1, 3] = x.loc[index2, 2].values + 2
or 要么
x.loc[index1, 3] += x.loc[index2, 2].values
(Note the second assignment uses the in-place addition operator, +=
.) (请注意,第二个分配使用就地加法运算符
+=
。)
If you have a lot of assignments that ignores the index, then perhaps you should be using a NumPy array instead of a Pandas DataFrame. 如果您有很多忽略索引的分配,那么也许您应该使用NumPy数组而不是Pandas DataFrame。
import pandas as pd
import numpy as np
x = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=[1, 2, 3], index=['a', 'b', 'c'])
arr = x.values
print(arr)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
index1 = slice(0,2)
index2 = slice(1,3)
arr[index1, 2] = arr[index2, 1]
print(arr)
# [[0 1 4]
# [3 4 7]
# [6 7 8]]
# Instead of x.loc[index1, 3] = x.loc[index2, 2]+2
arr[index1, 2] = arr[index2, 1] + 2
print(arr)
# [[0 1 6]
# [3 4 9]
# [6 7 8]]
# Instead of x.loc[index1, 3] = x.loc[index1, 3] + x.loc[index2, 2]
arr[index1, 2] += arr[index2, 1]
print(arr)
# [[ 0 1 10]
# [ 3 4 16]
# [ 6 7 8]]
x.loc[:,:] = arr
print(x)
# 1 2 3
# a 0 1 10
# b 3 4 16
# c 6 7 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.