基於現有列創建新列

Question

我有一個看起來像這樣的 Pandas 數據框。

Deviated_price  standard_price   
744,600          789,276
693,600          789,276
693,600          735,216
                 735,216
744,600          735,216
                 735,216

我想創建一個名為net_standard_price的新列。 凈標准價格的值將基於 Deviated_price 和 standard_price 列。

如果偏差價格不為空，則 net_standard_price 應為空。 如果偏差價格為空白，則 net_standard_price 應包含 standard_price 值。

Net_standard_price 應如下所示。

Deviated_price  standard_price  Net_standard_price   
                 789,276           789,276
693,600          789,276
693,600          735,216
                 735,216           735,216
744,600          735,216
                 735,216           735,216

我嘗試使用 np.where 下面的代碼，但 Net_standard_price 對於所有記錄都是空的。

df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
                                        '', df['standard_price'])

執行此操作的最有效方法是什么？

Answer 1

遷移到 numpy 域會帶來一些性能提升

import pandas as pd
import numpy as np
from timeit import Timer


def make_df():
    random_state = np.random.RandomState()
    df = pd.DataFrame(random_state.random((10000, 2)), columns=['Deviated_price', 'standard_price'], dtype=str)
    df['Deviated_price'][random_state.randint(0, 2, len(df)).astype(np.bool)] = None
    return df


def test1(df):
    df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
                                        '', df['standard_price'])

def test2(df):
    df['Net_standard_price'] = np.where(df['Deviated_price'].isna(), df['standard_price'], None)

def test3(df):
    temp = df['standard_price'].values
    temp2 = df['Deviated_price'].values
    net_standard_price = temp.copy()
    net_standard_price[temp2 == ''] = ''
    df['Net_standard_price'] = net_standard_price

timing = Timer(setup='df = make_df()', stmt='test1(df)', globals=globals()).timeit(500)
print('test1: ', timing)

timing = Timer(setup='df = make_df()', stmt='test2(df)', globals=globals()).timeit(500)
print('test2: ', timing)

timing = Timer(setup='df = make_df()', stmt='test3(df)', globals=globals()).timeit(500)
print('test3: ', timing)

test1:  0.42146812000000006
test2:  0.417552648
test3:  0.2913768969999999

基於現有列創建新列

問題描述

1 個解決方案

解決方案1
0 2021-11-14 19:43:41

基於現有列創建新列

問題描述

1 個解決方案

解決方案1 0 2021-11-14 19:43:41

解決方案1
0 2021-11-14 19:43:41