[英]Adding a new row to a dataframe in pandas for every iteration
與上述問題類似。
carrier_plan_identifier ... hios_issuer_identifier
1 AUSK ... 99806.0
2 AUSM ... 99806.0
3 AUSN ... 99806.0
4 AUSS ... 99806.0
5 AUST ... 99806.0
我需要選擇多個列,比方說carrier_plan_identifier
, wellthie_issuer_identifier
和hios_issuer_identifier
。
在這3列中,我需要運行一個選擇查詢,例如,
select id from table_name where carrier_plan_identifier = 'something' and wellthie_issuer_identifier = 'something' and hios_issuer_identifier = 'something'
我需要將id
列添加回現有數據框
目前,我正在做這樣的事情,
for index, frame in df_with_servicearea.iterrows():
if frame['service_area_id'] and frame['issuer_id']:
# reading from medical plans table
medical_plan_id = getmodeldata.get_medicalplans(sess, frame['issuer_id'], frame['hios_plan_identifier'], frame['plan_year'],
frame['group_or_individual_plan_type'])
frame['medical_plan_id'] = medical_plan_id
df_with_servicearea.append(frame)
當我這樣做時, frame['medical_plan_id'] = medical_plan_id
,什么也沒有添加。 但是,當我執行df_with_servicearea['medical_plan_id'] = medical_plan_id
僅將循環的最后一個值添加到所有行。 我不確定這是否是正確的方法。
更新-:
使用后,我得到4行,而不是應該在那里的2行。
df_with_servicearea = df_with_servicearea.append(frame)
wellthie_issuer_identifier ... medical_plan_id
0 UHC99806 ... NaN
1 UHC99806 ... NaN
0 UHC99806 ... 879519.0
1 UHC99806 ... 879520.0
更新2-根據Mayank的答案實施-嗨Mayank,您建議這樣。
對於索引,使用df_with_servicearea.iterrows()中的幀:
if frame['service_area_id'] and frame['issuer_id']:
# reading from medical plans table
df_new = getmodeldata.get_medicalplans(sess, frame['issuer_id'], frame['hios_plan_identifier'], frame['plan_year'],
frame['group_or_individual_plan_type'])
df_new.columns = ['medical_plan_id', 'issuer_id', 'hios_plan_identifier', 'plan_year',
'group_or_individual_plan_type']
new_df = pd.merge(df_with_servicearea, df_new, on=['issuer_id', 'hios_plan_identifier', 'plan_year', 'group_or_individual_plan_type'], how='left')
print new_df
我的get_medicalplans函數在其中調用選擇查詢。
def get_medicalplans(self,sess, issuerid, hios_plan_identifier, plan_year, group_or_individual_plan_type):
try:
medical_plan = sess.query(MedicalPlan.id, MedicalPlan.issuer_id, MedicalPlan.hios_plan_identifier,
MedicalPlan.plan_year, MedicalPlan.group_or_individual_plan_type).filter(MedicalPlan.issuer_id == issuerid,
MedicalPlan.hios_plan_identifier == hios_plan_identifier,
MedicalPlan.plan_year == plan_year,
MedicalPlan.group_or_individual_plan_type == group_or_individual_plan_type)
sess.commit()
return pd.read_sql(medical_plan.statement, medical_plan.session.bind)
解決問題的最簡單方法是將最后一行更改為:
df_with_servicearea = df_with_servicearea.append(frame)
但是,如果要添加新列,請使用:
df_with_servicearea['medical_plan_id'] = df_with_servicearea.apply(
lambda row:
getmodeldata.get_medicalplans(sess,
row['issuer_id'],
row['hios_plan_identifier'],
row['plan_year'],
row['group_or_individual_plan_type']
)
if row['service_area_id']
and row['issuer_id']
else np.nan)
嘗試這個:
考慮到您要基於以下3個列更新原始df:
1.)調整要在數據庫上觸發的查詢,以在select
子句中包括以下列: carrier_plan_identifier, wellthie_issuer_identifier and hios_issuer_identifier
。
select id,carrier_plan_identifier, wellthie_issuer_identifier,hios_issuer_identifier from table_name where carrier_plan_identifier = 'something' and wellthie_issuer_identifier = 'something' and hios_issuer_identifier = 'something'
2.)為以上結果創建一個數據框。
df = pd.DataFrame(cur.fetchall())
3.)現在df
上方有id
列和其他3列。 現在,根據以下列將此df
與original_df
merge
: carrier_plan_identifier, wellthie_issuer_identifier and hios_issuer_identifier
original_df = pd.merge(original_df,df, on=['carrier_plan_identifier','wellthie_issuer_identifier','hios_issuer_identifier'],how='outer')
Changed left join to Outer join.
因此,您必須了解這里發生的情況。 我將query dataframe(df)
與original df
在carrier_plan_identifier列,wellthie_issuer_identifier和hios_issuer_identifier列上,並附加id
列(因為它不存在)。 只要找到匹配項,來自df的id
列的值就會被復制到original_df
,如果不匹配,則id
列將具有NaN。 您不必使用任何循環。 只需嘗試我的代碼。
這將為所有匹配的行添加id
列到original_df
。 對於找不到匹配項的行,其id as Nan
。
您可以將Nan
替換為以下任何值:
original_df = original_df.fillna("")
讓我知道是否有幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.