![](/img/trans.png)
[英]Multiply 1 Dataframe by a row in another one selected based on its index value
[英]Replacing one row with another row at certain index of dataframe and changing cell value
我有一個樣本 csv 像這樣:
keys key_regex datatype detailed_datatype precedence val_regex val_regex_2 val_regex_3 max_words alpha_char_check
0 billingAddress original_billing_key_regex alphabetic address primary NaN NaN NaN NaN NaN
1 deliveryAddress original_delivery_key_regex alphabetic address primary NaN NaN NaN NaN NaN
2 notifyParty original_notify_party_regex alphabetic alphabetic primary NaN NaN NaN NaN NaN
3 originAddress original_seller_address_regex alphabetic address primary NaN NaN NaN NaN NaN
4 billingAddressAlt alternative_billing_key_regex alphabetic address tertiary NaN NaN NaN NaN NaN
5 deliveryAddressAlt alternative_delivery_key_regex alphabetic address tertiary NaN NaN NaN 5.0 1.0
6 originAddressAlt alternative_seller_key_regex alphabetic address tertiary NaN sample_val_re1 NaN NaN 0.0
我正在嘗試將keys
列的值作為tertiary_row_replacement_dict
中的鍵的行替換為keys
列值作為相應值的行,然后將precendence
列值從'tertiary'
重命名為'primary'
-同時保持索引 position 與以前相同。
預期的 output 是這樣的:
keys key_regex datatype detailed_datatype precedence val_regex val_regex_2 val_regex_3 max_words alpha_char_check
0 billingAddress alternative_billing_key_regex alphabetic address primary NaN NaN NaN NaN NaN
1 deliveryAddress alternative_delivery_key_regex alphabetic address primary NaN NaN NaN 5.0 1.0
2 notifyParty original_notify_party_regex alphabetic alphabetic primary NaN NaN NaN NaN NaN
3 originAddress alternative_seller_key_regex alphabetic address primary NaN sample_val_re1 NaN NaN 0.0
有 3 個原始 csv - 它們每個都很大,有很多類似的情況,即具有主要優先級的鍵和具有第三優先級的替代鍵。 我有這樣的鍵變字典:
tertiary_row_replacement_dict = {
"originAddress": "originAddressAlt",
"deliveryAddress": "deliveryAddressAlt",
# "totalAmount": "totalAmountAlt",
"billingAddress": "billingAddressAlt"
....
}
假設這個字典的鍵和對應的值總是出現在 csv 中,我有這個代碼:
for k, new_k in row_replacement_dict.items():
t2 = df.loc[df['keys']==new_k].index[0]
df.loc[df.loc[df['keys']==k].index[0]] = [i if i!='tertiary' else 'primary' for i in df.loc[t2]]
df = df.replace([new_k, 'tertiary'], [k, 'primary']).drop([t2])
它完成了我想做的事情。 僅在測試 csv 上執行此操作大約需要 0.034 秒,並且可能不是處理僅替換行和替換單元格值的最佳或優化方法。 是否有任何更快的替代方法具有先決條件知識要替換哪些行(即,使用該字典不是強制性的,我們可以將其用作列表列表的元組列表以進行速度權衡)。
您可以使用replace
將三級鍵替換為主鍵和groupby().first()
來填寫信息:
inverse_dict = {v:k for k,v in tertiary_row_replacement_dict.items()}
(df.groupby(df['keys'].replace(inverse_dict))
.first()
.reset_index(drop=True)
)
Output:
keys key_regex datatype detailed_datatype precedence val_regex val_regex_2 val_regex_3 max_words alpha_char_check
-- --------------- ----------------------------- ---------- ------------------- ------------ ----------- -------------- ------------- ----------- ------------------
0 billingAddress original_billing_key_regex alphabetic address primary nan nan nan nan nan
1 deliveryAddress original_delivery_key_regex alphabetic address primary nan nan nan 5 1
2 notifyParty original_notify_party_regex alphabetic alphabetic primary nan nan nan nan nan
3 originAddress original_seller_address_regex alphabetic address primary nan sample_val_re1 nan nan 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.