[英]Merging rows in a pandas dataframe based on mutiple values
這本質上與其他列匹配的 dataframe 的合並值有關,但由於這個問題已經得到回答,而且我沒有找到針對不同問題的正確修改,所以我打開了這個新線程。 希望沒關系。 對問題。 我有以下數據
date car_brand color city stolen
"2020-01-01" porsche red paris False
"2020-01-01" porsche red london False
"2020-01-01" porsche red munich False
"2020-01-01" porsche red madrid False
"2020-01-01" porsche red rome False
"2020-01-01" porsche blue berlin False
"2020-01-01" porsche blue tokyo False
"2020-01-01" porsche blue peking False
"2020-01-01" porsche white liverpool False
"2020-01-01" porsche white oslo False
"2020-01-01" porsche white barcelona False
"2020-01-01" porsche white miami False
"2020-01-02" porsche red paris False
"2020-01-02" porsche red london False
"2020-01-02" porsche red munich False
"2020-01-02" porsche red madrid False
"2020-01-02" porsche red rome False
"2020-01-02" porsche blue berlin False
"2020-01-02" porsche blue tokyo False
"2020-01-02" porsche blue peking False
"2020-01-02" porsche white liverpool False
"2020-01-02" porsche white oslo False
"2020-01-02" porsche white barcelona False
"2020-01-02" porsche white miami False
"2020-01-03" porsche red paris False
"2020-01-03" porsche red london False
"2020-01-03" porsche red munich False
"2020-01-03" porsche red madrid True
"2020-01-03" porsche red rome False
"2020-01-03" porsche blue berlin False
"2020-01-03" porsche blue tokyo False
"2020-01-03" porsche blue peking False
"2020-01-03" porsche white liverpool False
"2020-01-03" porsche white oslo False
"2020-01-03" porsche white barcelona False
"2020-01-03" porsche white miami False
"2020-01-04" porsche red paris False
"2020-01-04" porsche red london False
"2020-01-04" porsche red munich False
"2020-01-04" porsche red madrid False
"2020-01-04" porsche red rome False
"2020-01-04" porsche blue berlin False
"2020-01-04" porsche blue tokyo False
"2020-01-04" porsche blue peking False
"2020-01-04" porsche white liverpool False
"2020-01-04" porsche white oslo False
"2020-01-04" porsche white barcelona False
"2020-01-04" porsche white miami False
我知道如何根據以下方式創建 dataframe:如果連續幾天 boolean 與所有條目“被盜”匹配,那么我想合並日期列。 例如,在上面的示例中,boolean 條目匹配“2020-01-01”和“2020-01-02”。 所以總的來說,我想得到以下結果:
date car_brand color city stolen
["2020-01-01","2020-01-02"] porsche red paris False
["2020-01-01","2020-01-02"] porsche red london False
["2020-01-01","2020-01-02"] porsche red munich False
["2020-01-01","2020-01-02"] porsche red madrid False
["2020-01-01","2020-01-02"] porsche red rome False
["2020-01-01","2020-01-02"] porsche blue berlin False
["2020-01-01","2020-01-02"] porsche blue tokyo False
["2020-01-01","2020-01-02"] porsche blue peking False
["2020-01-01","2020-01-02"] porsche white liverpool False
["2020-01-01","2020-01-02"] porsche white oslo False
["2020-01-01","2020-01-02"] porsche white barcelona False
["2020-01-01","2020-01-02"] porsche white miami False
["2020-01-03"] porsche red paris False
["2020-01-03"] porsche red london False
["2020-01-03"] porsche red munich False
["2020-01-03"] porsche red madrid True
["2020-01-03"] porsche red rome False
["2020-01-03"] porsche blue berlin False
["2020-01-03"] porsche blue tokyo False
["2020-01-03"] porsche blue peking False
["2020-01-03"] porsche white liverpool False
["2020-01-03"] porsche white oslo False
["2020-01-03"] porsche white barcelona False
["2020-01-03"] porsche white miami False
["2020-01-04"] porsche red paris False
["2020-01-04"] porsche red london False
["2020-01-04"] porsche red munich False
["2020-01-04"] porsche red madrid False
["2020-01-04"] porsche red rome False
["2020-01-04"] porsche blue berlin False
["2020-01-04"] porsche blue tokyo False
["2020-01-04"] porsche blue peking False
["2020-01-04"] porsche white liverpool False
["2020-01-04"] porsche white oslo False
["2020-01-04"] porsche white barcelona False
["2020-01-04"] porsche white miami False
為了簡短起見,代碼沒有從示例數據中構建 dataframe。
關鍵技術是一個新的列,隨着被盜日期的變化而變化。 值變化增量
df["date"] = pd.to_datetime(df["date"])
# require new group when there is a stolen car in any date
df2 = (df.groupby("date")["stolen"].max().to_frame()
.reset_index()
.assign(stolen_grp=lambda dfa: (dfa.stolen.diff() != 0).cumsum())
.drop(columns="stolen")
)
# put stolen_grp back into dataframe
df = df.merge(df2, on="date")
# same technique, breaking on days a car has been stolen
(
df
.groupby([c for c in df.columns if c!="date"])["date"]
# only include if first date or if it's a consequetive date
.agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
.reset_index()
.drop(columns="stolen_grp")
)
car_brand color city stolen date
porsche blue berlin False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue berlin False [2020-01-03 00:00:00]
porsche blue berlin False [2020-01-04 00:00:00]
porsche blue peking False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue peking False [2020-01-03 00:00:00]
porsche blue peking False [2020-01-04 00:00:00]
porsche blue tokyo False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
porsche blue tokyo False [2020-01-03 00:00:00]
porsche blue tokyo False [2020-01-04 00:00:00]
porsche red london False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.