[英]Create new column based on existing columns
我有一個看起來像這樣的 Pandas 數據框。
Deviated_price standard_price
744,600 789,276
693,600 789,276
693,600 735,216
735,216
744,600 735,216
735,216
我想創建一個名為net_standard_price的新列。 凈標准價格的值將基於 Deviated_price 和 standard_price 列。
如果偏差價格不為空,則 net_standard_price 應為空。 如果偏差價格為空白,則 net_standard_price 應包含 standard_price 值。
Net_standard_price 應如下所示。
Deviated_price standard_price Net_standard_price
789,276 789,276
693,600 789,276
693,600 735,216
735,216 735,216
744,600 735,216
735,216 735,216
我嘗試使用 np.where 下面的代碼,但 Net_standard_price 對於所有記錄都是空的。
df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
'', df['standard_price'])
執行此操作的最有效方法是什么?
遷移到 numpy 域會帶來一些性能提升
import pandas as pd
import numpy as np
from timeit import Timer
def make_df():
random_state = np.random.RandomState()
df = pd.DataFrame(random_state.random((10000, 2)), columns=['Deviated_price', 'standard_price'], dtype=str)
df['Deviated_price'][random_state.randint(0, 2, len(df)).astype(np.bool)] = None
return df
def test1(df):
df['Net_standard_price'] = np.where(df['Deviated_price'] != '',
'', df['standard_price'])
def test2(df):
df['Net_standard_price'] = np.where(df['Deviated_price'].isna(), df['standard_price'], None)
def test3(df):
temp = df['standard_price'].values
temp2 = df['Deviated_price'].values
net_standard_price = temp.copy()
net_standard_price[temp2 == ''] = ''
df['Net_standard_price'] = net_standard_price
timing = Timer(setup='df = make_df()', stmt='test1(df)', globals=globals()).timeit(500)
print('test1: ', timing)
timing = Timer(setup='df = make_df()', stmt='test2(df)', globals=globals()).timeit(500)
print('test2: ', timing)
timing = Timer(setup='df = make_df()', stmt='test3(df)', globals=globals()).timeit(500)
print('test3: ', timing)
test1: 0.42146812000000006
test2: 0.417552648
test3: 0.2913768969999999
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.