[英]Populating a dataframe by looking up another dataframe in pandas
我有一個像下面這樣的pandas數據幀(df):
AccountName AccountName2 DateTime
abc guest 2016-06-10 20:46
guest 2016-06-10 21:32
def 2016-06-10 23:11
2016-06-10 23:31
ghi 2016-06-10 24:41
我需要根據上面的數據框導出一個新的數據幀(df1)。 df1應該有2個字段,ResultAccount和DateTime。
if(df["AccountName"] != ' '):
df1["ResultAccount"] = df["AccountName"]
elif(df["AccountName2] != ' '):
df1["ResultAccount"] = df["AccountName2"]
else:
df1["ResultAccount"] = "none"
這是我遵循的方法,但df1
沒有按預期填充。 任何幫助,將不勝感激。
我想你可以先用空格' '
replace
NaN
字符串,然后用last_valid_index
apply
自定義函數f
。 輸出Dataframe
從Series
ResultAccount
和df.DateTime
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'AccountName2': {0: 'guest', 1: 'guest', 2: ' ', 3: ' ', 4: ' '},
'DateTime': {0: '2016-06-10 20:46', 1: '2016-06-10 21:32', 2: '2016-06-10 23:11', 3: '2016-06-10 23:31', 4: '2016-06-10 24:41'},
'AccountName': {0: 'abc', 1: ' ', 2: 'def', 3: ' ', 4: 'ghi'}})
print (df)
AccountName AccountName2 DateTime
0 abc guest 2016-06-10 20:46
1 guest 2016-06-10 21:32
2 def 2016-06-10 23:11
3 2016-06-10 23:31
4 ghi 2016-06-10 24:41
df[['AccountName','AccountName2']] = df[['AccountName','AccountName2']].replace(' ',np.nan)
def f(x):
if x.first_valid_index() is None:
return 'None'
else:
return x[x.first_valid_index()]
ResultAccount = (df[['AccountName','AccountName2']].apply(f, axis=1))
df1 = pd.DataFrame({'ResultAccount':ResultAccount ,'DateTime':df.DateTime},
columns=['ResultAccount','DateTime'])
print (df1)
ResultAccount DateTime
0 abc 2016-06-10 20:46
1 guest 2016-06-10 21:32
2 def 2016-06-10 23:11
3 None 2016-06-10 23:31
4 ghi 2016-06-10 24:41
你可以使用np.select
。 它是np.where
的多條件泛化:
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'AccountName': ['abc', ' ', 'def', ' ', 'ghi'],
'AccountName2': ['guest', 'guest', ' ', ' ', ' '],
'DateTime': ['2016-06-10 20:46', '2016-06-10 21:32', '2016-06-10 23:11', '2016-06-10 23:31', '2016-06-10 24:41']})
conditions = [df['AccountName'] != ' ', df['AccountName2'] != ' ']
choices = [df["AccountName"], df["AccountName2"]]
df['ResultAccount'] = np.select(conditions, choices, default='none')
產量
AccountName AccountName2 DateTime ResultAccount
0 abc guest 2016-06-10 20:46 abc
1 guest 2016-06-10 21:32 guest
2 def 2016-06-10 23:11 def
3 2016-06-10 23:31 none
4 ghi 2016-06-10 24:41 ghi
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.