python pandas：嘗試在DataFrame的切片副本上設置一個值

Question

您能否建議根據http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy改寫以下幾行

df.drop('PACKETS', axis=1, inplace=True)

產生

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.drop('PACKETS', axis=1, inplace=True)
/home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:74: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

df.replace(numpy.nan, "", inplace=True)

產生

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df.replace(numpy.nan, "", inplace=True)
/home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:68: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

另一方面，以下是基於上述原理如何重寫它的示例

df.loc[:, ('SRC_PREFIX')]   = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)

但是我不知道如何重寫案例1和2？

編輯：到目前為止，代碼看起來像這樣（ df是感興趣的數據幀）。 因此，最初是某種類型的轉換：

df = pandas.DataFrame(data['payload'], columns=sorted(data['header'], key=data['header'].get))
        df = df.astype({
            'SRC_AS'                : "object",
            'DST_AS'                : "object",
            'COMMS'                 : "object",
            'SRC_COMMS'             : "object",
            'AS_PATH'               : "object",
            'SRC_AS_PATH'           : "object",
            'PREF'                  : "object",
            'SRC_PREF'              : "object",
            'MED'                   : "object",
            'SRC_MED'               : "object",
            'PEER_SRC_AS'           : "object",
            'PEER_DST_AS'           : "object",
            'PEER_SRC_IP'           : "object",
            'PEER_DST_IP'           : "object",
            'IN_IFACE'              : "object",
            'OUT_IFACE'             : "object",
            'SRC_NET'               : "object",
            'DST_NET'               : "object",
            'SRC_MASK'              : "object",
            'DST_MASK'              : "object",
            'PROTOCOL'              : "object",
            'TOS'                   : "object",
            'SAMPLING_RATE'         : "uint64",
            'EXPORT_PROTO_VERSION'  : "object",
            'PACKETS'               : "object",
            'BYTES'                 : "uint64",
        })

然后將模塊的calculate功能稱為：

mod.calculate(data['identifier'], data['timestamp'], df)

並且calculate函數的定義如下：

def calculate(identifier, timestamp, df):
    try:
        #   Filter based on AORTA IX.
        lut_ipaddr = lookup_ipaddr()
        df = df[ (df.PEER_SRC_IP.isin( lut_ipaddr )) ]
        if df.shape[0] > 0:
            logger.info('analyzing message `{}`'.format(identifier))
            #   Preparing for input.
            df.replace("", numpy.nan, inplace=True)
            #   Data wrangling. Calculate traffic rate. Reduce.
            df.loc[:, ('BPS')]          = 8*df['BYTES']*df['SAMPLING_RATE']/300
            df.drop(columns=['SAMPLING_RATE', 'EXPORT_PROTO_VERSION', 'PACKETS', 'BYTES'], inplace=True)
            #   Data wrangling. Formulate prefixes using CIDR notation. Reduce.
            df.loc[:, ('SRC_PREFIX')]   = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)
            df.loc[:, ('DST_PREFIX')]   = df[ ['DST_NET', 'DST_MASK'] ].apply(lambda x: "/".join(x), axis=1)
            df.drop(columns=['SRC_NET', 'SRC_MASK', 'DST_NET' ,'DST_MASK'], inplace=True)
            #   Populate using lookup tables.
            df.loc[:, ('NETELEMENT')]   = df['PEER_SRC_IP'].apply(lookup_netelement)
            df.loc[:, ('IN_IFNAME')]    = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['IN_IFACE']), axis=1)
            df.loc[:, ('OUT_IFNAME')]   = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['OUT_IFACE']), axis=1)
            # df.loc[:, ('SRC_ASNAME')]   = df.apply(lambda x: lookup_asn(x['SRC_AS']), axis=1)
            #   Add a timestamp.
            df.loc[:, ('METERED_ON')]   = arrow.get(timestamp, "YYYYMMDDHHmm").format("YYYY-MM-DD HH:mm:ss")
            #   Preparing for input.
            df.replace(numpy.nan, "", inplace=True)
            #   Finalize !
            return identifier, timestamp, df.to_dict(orient="records")
        else:
            logger.info('going through message `{}` no IX bgp/netflow data were found'.format(identifier))
    except Exception as e:
        logger.error('processing message `{}` at `{}` caused `{}`'.format(identifier,timestamp,repr(e)), exc_info=True)
    return identifier, timestamp, None

Answer 1

好。 我真的不知道大熊貓下發生了什么。 但是，我仍然嘗試提供一些最小的示例，以向您展示問題的根源和解決方法。 首先，創建數據框：

import numpy as np
import pandas as pd
df = pd.DataFrame(dict(x=[0, 1, 2],
                       y=[0, 0, 5]))

然后，當您將數據幀傳遞給一個函數時，我將執行相同的操作，但對於2個幾乎相同的函數：

def func(dfx):
    # Analog of your df = df[df.PEER_SRC_IP.isin(lut_ipaddr)]
    dfx = dfx[dfx['x'] > 1.5]
    # Analog of your df.replace("", numpy.nan, inplace=True)
    dfx.replace(5, np.nan, inplace=True)
def func_with_copy(dfx):
    dfx = dfx[dfx['x'] > 1.5].copy()  # explicitly making a copy
    dfx.replace(5, np.nan, inplace=True)

現在，將它們稱為初始df：

func_with_copy(df)
print(df)

給

而且沒有警告。 並調用此：

func(df)
print(df)

給出相同的輸出：

但警告：

/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

因此，這看起來像是“誤報”。 這是對誤報的好評論：鏈接

奇怪的是，如果您對數據框執行完全相同的操作，但未將其傳遞給函數，則不會看到此警告。 ¯\\ _（ツ）_ /¯

我的建議是使用.copy()

python pandas：嘗試在DataFrame的切片副本上設置一個值

問題描述

1 個解決方案

解決方案1
1 已采納 2017-11-11 21:44:05

python pandas：嘗試在DataFrame的切片副本上設置一個值

問題描述

1 個解決方案

解決方案1 1 已采納 2017-11-11 21:44:05

解決方案1
1 已采納 2017-11-11 21:44:05