[英]Python Dataframe : Update the values of a column in a dataframe based on another dataframe
[英]Update values with earlier date based on list by another dataframe in Python
我想用 dataframe dateEANdf
的值更新 dataframe valueEANdf
的列Value
,但只更新日期較早的那些行。
valueEANdf
的摘錄如下所示:
EAN-Unique Value
3324324 3.0
asd2343 2.0
Xjkhfsd 1.2
5234XAR 4.5
3434343 2.6
dateEANdf
的摘錄如下所示,它包含每個 EAN 兩次,其中包含較早和較晚的日期。
EAN-Unique Date Start Value
3324324 2018-06-01 yes
3324324 2019-04-30 no
asd2343 2015-03-23 yes
asd2343 2015-07-11 no
Xjkhfsd 1999-04-12 yes
Xjkhfsd 2001-02-01 no
5234XAR 2000-12-13 yes
5234XAR 2013-12-13 no
3434343 1972-05-23 yes
3434343 1980-11-01 no
更新的dateEANdf
應如下所示:
EAN-Unique Date Start Value
3324324 2018-06-01 yes 3.0
3324324 2019-04-30 no
asd2343 2015-03-23 yes 2.0
asd2343 2015-07-11 no
Xjkhfsd 1999-04-12 yes 1.2
Xjkhfsd 2001-02-01 no
5234XAR 2000-12-13 yes 4.5
5234XAR 2013-12-13 no
3434343 1972-05-23 yes 2.6
3434343 1980-11-01 no
我的嘗試是
dateEANdf.loc[ (dateEANdf['EAN-Unique'].isin(valueEANdf.unique().tolist())) & ( dateEANdf['Start'] == 'yes') , 'Value' ] = valueEANdf['Value']
但是,這會將值隨機放置在“某處”,但不會放在較早的日期。 如何解決?
謝謝。
嘗試loc
切片,然后map
:
s = dateEANdf['Start'].eq('yes')
dateEANdf.loc[s, 'Value'] = (dateEANdf.loc[s, 'EAN-Unique']
.map(valueEANdf.set_index('EAN-Unique')['Value'])
)
或者 map 整個系列然后where
:
dateEANdf['Value'] = (dateEANdf['EAN-Unique'].map(valueEANdf.set_index('EAN-Unique')['Value'])
.where(dateEANdf['Start'].eq('yes'))
)
Output:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
您可以進行merge
,然后使用np.where
更新值:
# If 'Value' is not already in 'dateEANdf', then remove `dateEANdf.drop('Value', axis=1)`
dateEANdf = dateEANdf.drop('Value', axis=1).merge(valueEANdf, how='left', on='EAN-Unique')
dateEANdf['Value'] = np.where(dateEANdf['Start'] == 'no', np.nan, dateEANdf['Value'])
dateEANdf
Out[1]:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
import pandas as pd
import numpy as np
你也可以這樣做:
dateEANdf['Value']=dateEANdf['EAN-Unique'].apply(
lambda row: float(valueEANdf[valueEANdf['EAN-Unique']==row].Value))
這會給你:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no 3.0
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no 2.0
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no 1.2
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no 4.5
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no 2.6
刪除值中的每一秒值,依賴於這個線程: Pandas 每第 n 行:
dateEANdf.loc[1::2,'Value']=np.nan
這將導致:
EAN-Unique Date Start Value
0 3324324 2018-06-01 yes 3.0
1 3324324 2019-04-30 no NaN
2 asd2343 2015-03-23 yes 2.0
3 asd2343 2015-07-11 no NaN
4 Xjkhfsd 1999-04-12 yes 1.2
5 Xjkhfsd 2001-02-01 no NaN
6 5234XAR 2000-12-13 yes 4.5
7 5234XAR 2013-12-13 no NaN
8 3434343 1972-05-23 yes 2.6
9 3434343 1980-11-01 no NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.