使用pandas.DataFrame设置值

Question

Having this DataFrame: 具有此DataFrame：

import pandas

dates = pandas.date_range('2016-01-01', periods=5, freq='H')
s = pandas.Series([0, 1, 2, 3, 4], index=dates)
df = pandas.DataFrame([(1, 2, s, 8)], columns=['a', 'b', 'foo', 'bar'])
df.set_index(['a', 'b'], inplace=True)

df

I would like to replace the Series in there with a new one that is simply the old one, but resampled to a day period (ie x.resample('D').sum().dropna() ). 我想用一个简单的旧系列替换那里的系列，但是重新采样到一天的时间（即x.resample('D').sum().dropna() ）。

When I try: 当我尝试：

df['foo'][0] = df['foo'][0].resample('D').sum().dropna()

That seems to work well: 这似乎运作良好：

However, I get a warning: 但是，我得到一个警告：

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

The question is, how should I do this instead? 问题是，我应该怎么做呢？

Notes 笔记

Things I have tried but do not work (resampling or not, the assignment raises an exception): 我尝试过但不起作用的事情（是否重新采样，分配引发异常）：

df.iloc[0].loc['foo'] = df.iloc[0].loc['foo']
df.loc[(1, 2), 'foo'] = df.loc[(1, 2), 'foo']
df.loc[df.index[0], 'foo'] = df.loc[df.index[0], 'foo']

A bit more information about the data (in case it is relevant): 有关数据的更多信息（如果相关）：

The real DataFrame has more columns in the multi-index. 实际的DataFrame在多索引中具有更多列。 Not all of them necessarily integers, but more generally numerical and categorical. 它们并非全部都是整数，而是更一般的数字和分类。 The index is unique (ie: there is only one row with a given index value). 索引是唯一的（即：只有一行具有给定的索引值）。
The real DataFrame has, of course, many more rows in it (thousands). 当然，实际的DataFrame中有更多的行（数千）。
There are not necessarily only two columns in the DataFrame and there may be more than 1 columns containing a Series type. DataFrame中不一定只有两列，并且可能有不止一个包含Series类型的列。 Columns usually contain series, categorical data and numerical data as well. 列通常也包含系列，分类数据和数值数据。 Any single column is always single-typed (either numerical, or categorical, or series). 任何单个列始终为单一类型（数字，类别或系列）。
The series contained in each cell usually have a variable length (ie: two series/cells in the DataFrame do not, unless pure coincidence, have the same length, and will probably never have the same index anyway, as dates vary as well between series). 每个单元格中包含的系列通常具有可变的长度（即：DataFrame中的两个系列/单元格除非完全符合，否则不会具有相同的长度，并且可能永远不会具有相同的索引，因为系列之间的日期也会有所不同）。

Using Python 3.5.1 and Pandas 0.18.1. 使用Python 3.5.1和Pandas 0.18.1。

Answer 1

This should work: 这应该工作：

df.iat[0, df.columns.get_loc('foo')] = df['foo'][0].resample('D').sum().dropna()

Pandas is complaining about chained indexing but when you don't do it that way it's facing problems assigning whole series to a cell. 熊猫抱怨链式索引，但是当您不这样做时，它将面临将整个系列分配给一个单元的问题。 With iat you can force something like that. 使用iat您可以强制执行类似操作。 I don't think it would be a preferable thing to do, but seems like a working solution. 我认为这样做不是一件可取的事情，但似乎是一个可行的解决方案。

Answer 2

Hierarchical data in pandas 熊猫中的分层数据

It really seems like you should consider restructure your data to take advantage of pandas features such as MultiIndexing and DateTimeIndex . 看来，您似乎应该考虑重组数据以利用诸如MultiIndexing和DateTimeIndex类的熊猫功能。 This will allow you to still operate on a index in the typical way while being able to select on multiple columns across the hierarchical data ( a , b , and bar ). 这将使您仍可以按常规方式对索引进行操作，同时可以在层次结构数据（ a ， b和bar ）的多个列上进行选择。

Restructured Data 重组数据

import pandas as pd

# Define Index
dates = pd.date_range('2016-01-01', periods=5, freq='H')
# Define Series
s = pd.Series([0, 1, 2, 3, 4], index=dates)

# Place Series in Hierarchical DataFrame
heirIndex = pd.MultiIndex.from_arrays([1,2,8], names=['a','b', 'bar'])
df = pd.DataFrame(s, columns=heirIndex)

print df

a                    1
b                    2
bar                  8
2016-01-01 00:00:00  0
2016-01-01 01:00:00  1
2016-01-01 02:00:00  2
2016-01-01 03:00:00  3
2016-01-01 04:00:00  4

Resampling 重采样

With the data in this format, resampling becomes very simple. 使用这种格式的数据，重新采样变得非常简单。

# Simple Direct Resampling
df_resampled = df.resample('D').sum().dropna()

print df_resampled

a            1
b            2
bar          8
2016-01-01  10

Update (from data description) 更新（根据数据描述）

If the data has variable length Series each with a different index and non-numeric categories that is ok. 如果数据的长度可变，则Series具有不同的index和非数字类别，则可以。 Let's make an example: 让我们举个例子：

# Define Series
dates = pandas.date_range('2016-01-01', periods=5, freq='H')
s = pandas.Series([0, 1, 2, 3, 4], index=dates)

# Define Series
dates2 = pandas.date_range('2016-01-14', periods=6, freq='H')
s2 = pandas.Series([-200, 10, 24, 30, 40,100], index=dates2)
# Define DataFrames
df1 = pd.DataFrame(s, columns=pd.MultiIndex.from_arrays([1,2,8,'cat1'], names=['a','b', 'bar','c']))
df2 = pd.DataFrame(s2, columns=pd.MultiIndex.from_arrays([2,5,5,'cat3'], names=['a','b', 'bar','c']))

df = pd.concat([df1, df2])
print df

a                      1      2
b                      2      5
bar                    8      5
c                   cat1   cat3
2016-01-01 00:00:00  0.0    NaN
2016-01-01 01:00:00  1.0    NaN
2016-01-01 02:00:00  2.0    NaN
2016-01-01 03:00:00  3.0    NaN
2016-01-01 04:00:00  4.0    NaN
2016-01-14 00:00:00  NaN -200.0
2016-01-14 01:00:00  NaN   10.0
2016-01-14 02:00:00  NaN   24.0
2016-01-14 03:00:00  NaN   30.0
2016-01-14 04:00:00  NaN   40.0
2016-01-14 05:00:00  NaN  100.0

The only issues is that after resampling. 唯一的问题是重新采样后。 You will want to use how='all' while dropping na rows like this: 您将要使用how='all'而下降na行是这样的：

# Simple Direct Resampling
df_resampled = df.resample('D').sum().dropna(how='all')

print df_resampled

a              1    2
b              2    5
bar            8    5
c           cat1 cat3
2016-01-01  10.0  NaN
2016-01-14   NaN  4.0

Answer 3

只需在分配新值之前将df.is_copy = False设置df.is_copy = False 。

使用pandas.DataFrame设置值

问题描述

Notes 笔记

3 个解决方案

解决方案1
3 已采纳 2016-06-01 14:44:10

解决方案2
0 2016-06-10 04:18:15

Hierarchical data in pandas 熊猫中的分层数据

Restructured Data 重组数据

Resampling 重采样

Update (from data description) 更新（根据数据描述）

解决方案3
0 2016-06-10 11:25:25

使用pandas.DataFrame设置值

问题描述

Notes 笔记

3 个解决方案

解决方案1 3 已采纳 2016-06-01 14:44:10

解决方案2 0 2016-06-10 04:18:15

Hierarchical data in pandas 熊猫中的分层数据

Restructured Data 重组数据

Resampling 重采样

Update (from data description) 更新（根据数据描述）

解决方案3 0 2016-06-10 11:25:25

解决方案1
3 已采纳 2016-06-01 14:44:10

解决方案2
0 2016-06-10 04:18:15

解决方案3
0 2016-06-10 11:25:25