简体   繁体   English

对熊猫DataFrame进行矢量化更新?

[英]Vectorized update to pandas DataFrame?

I have a dataframe for which I'd like to update a column with some values from an array. 我有一个数据框,我想使用数组中的某些值更新列。 The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update. 数组的长度与数据帧的长度不同,但是我有要更新的数据帧行的索引。

I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct. 我可以通过下面的行循环来做到这一点,但是我希望有一种更有效的方法可以通过矢量化方法来做到这一点,但是我似乎无法正确理解语法。

In the example below I just fill the column with nan and then use the indices directly through a loop. 在下面的示例中,我只用nan填充列,然后直接通过循环使用索引。

df['newcol'] = np.nan

j = 0
for i in update_idx:
    df['newcol'][i] = new_values[j]
    j+=1

if you have a list of indices already then you can use loc to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN assigned: 如果已经有了索引列表,则可以使用loc来执行标签(行)选择,可以传递新的列名,在未选择现有行的情况下,这些行将被分配NaN

df.loc[update_idx, 'new_col'] = new_value

Example: 例:

In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df

Out[4]:
   a         b
a  0  1.800300
b  1  0.351843
c  2  0.278122
d  3  1.387417
e  4  1.202503

In [5]:    
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df

Out[5]:
   a         b   c
a  0  1.800300 NaN
b  1  0.351843   0
c  2  0.278122 NaN
d  3  1.387417   1
e  4  1.202503   2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM