python pandas：尝试在DataFrame的切片副本上设置一个值

Question

您能否建议根据http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy改写以下几行

df.drop('PACKETS', axis=1, inplace=True)

产生

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.drop('PACKETS', axis=1, inplace=True)
/home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:74: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

df.replace(numpy.nan, "", inplace=True)

产生

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.replace(numpy.nan, "", inplace=True)
/home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:68: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

另一方面，以下是基于上述原理如何重写它的示例

df.loc[:, ('SRC_PREFIX')]   = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)

但是我不知道如何重写案例1和2？

编辑：到目前为止，代码看起来像这样（ df是感兴趣的数据帧）。 因此，最初是某种类型的转换：

df = pandas.DataFrame(data['payload'], columns=sorted(data['header'], key=data['header'].get))
        df = df.astype({
            'SRC_AS'                : "object",
            'DST_AS'                : "object",
            'COMMS'                 : "object",
            'SRC_COMMS'             : "object",
            'AS_PATH'               : "object",
            'SRC_AS_PATH'           : "object",
            'PREF'                  : "object",
            'SRC_PREF'              : "object",
            'MED'                   : "object",
            'SRC_MED'               : "object",
            'PEER_SRC_AS'           : "object",
            'PEER_DST_AS'           : "object",
            'PEER_SRC_IP'           : "object",
            'PEER_DST_IP'           : "object",
            'IN_IFACE'              : "object",
            'OUT_IFACE'             : "object",
            'SRC_NET'               : "object",
            'DST_NET'               : "object",
            'SRC_MASK'              : "object",
            'DST_MASK'              : "object",
            'PROTOCOL'              : "object",
            'TOS'                   : "object",
            'SAMPLING_RATE'         : "uint64",
            'EXPORT_PROTO_VERSION'  : "object",
            'PACKETS'               : "object",
            'BYTES'                 : "uint64",
        })

然后将模块的calculate功能称为：

mod.calculate(data['identifier'], data['timestamp'], df)

并且calculate函数的定义如下：

def calculate(identifier, timestamp, df):
    try:
        #   Filter based on AORTA IX.
        lut_ipaddr = lookup_ipaddr()
        df = df[ (df.PEER_SRC_IP.isin( lut_ipaddr )) ]
        if df.shape[0] > 0:
            logger.info('analyzing message `{}`'.format(identifier))
            #   Preparing for input.
            df.replace("", numpy.nan, inplace=True)
            #   Data wrangling. Calculate traffic rate. Reduce.
            df.loc[:, ('BPS')]          = 8*df['BYTES']*df['SAMPLING_RATE']/300
            df.drop(columns=['SAMPLING_RATE', 'EXPORT_PROTO_VERSION', 'PACKETS', 'BYTES'], inplace=True)
            #   Data wrangling. Formulate prefixes using CIDR notation. Reduce.
            df.loc[:, ('SRC_PREFIX')]   = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)
            df.loc[:, ('DST_PREFIX')]   = df[ ['DST_NET', 'DST_MASK'] ].apply(lambda x: "/".join(x), axis=1)
            df.drop(columns=['SRC_NET', 'SRC_MASK', 'DST_NET' ,'DST_MASK'], inplace=True)
            #   Populate using lookup tables.
            df.loc[:, ('NETELEMENT')]   = df['PEER_SRC_IP'].apply(lookup_netelement)
            df.loc[:, ('IN_IFNAME')]    = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['IN_IFACE']), axis=1)
            df.loc[:, ('OUT_IFNAME')]   = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['OUT_IFACE']), axis=1)
            # df.loc[:, ('SRC_ASNAME')]   = df.apply(lambda x: lookup_asn(x['SRC_AS']), axis=1)
            #   Add a timestamp.
            df.loc[:, ('METERED_ON')]   = arrow.get(timestamp, "YYYYMMDDHHmm").format("YYYY-MM-DD HH:mm:ss")
            #   Preparing for input.
            df.replace(numpy.nan, "", inplace=True)
            #   Finalize !
            return identifier, timestamp, df.to_dict(orient="records")
        else:
            logger.info('going through message `{}` no IX bgp/netflow data were found'.format(identifier))
    except Exception as e:
        logger.error('processing message `{}` at `{}` caused `{}`'.format(identifier,timestamp,repr(e)), exc_info=True)
    return identifier, timestamp, None

Answer 1

好。 我真的不知道大熊猫下发生了什么。 但是，我仍然尝试提供一些最小的示例，以向您展示问题的根源和解决方法。 首先，创建数据框：

import numpy as np
import pandas as pd
df = pd.DataFrame(dict(x=[0, 1, 2],
                       y=[0, 0, 5]))

然后，当您将数据帧传递给一个函数时，我将执行相同的操作，但对于2个几乎相同的函数：

def func(dfx):
    # Analog of your df = df[df.PEER_SRC_IP.isin(lut_ipaddr)]
    dfx = dfx[dfx['x'] > 1.5]
    # Analog of your df.replace("", numpy.nan, inplace=True)
    dfx.replace(5, np.nan, inplace=True)
def func_with_copy(dfx):
    dfx = dfx[dfx['x'] > 1.5].copy()  # explicitly making a copy
    dfx.replace(5, np.nan, inplace=True)

现在，将它们称为初始df：

func_with_copy(df)
print(df)

给

而且没有警告。 并调用此：

func(df)
print(df)

给出相同的输出：

但警告：

/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

因此，这看起来像是“误报”。 这是对误报的好评论：链接

奇怪的是，如果您对数据框执行完全相同的操作，但未将其传递给函数，则不会看到此警告。 ¯\\ _（ツ）_ /¯

我的建议是使用.copy()

python pandas：尝试在DataFrame的切片副本上设置一个值

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-11 21:44:05

python pandas：尝试在DataFrame的切片副本上设置一个值

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-11 21:44:05

解决方案1
1 已采纳 2017-11-11 21:44:05