簡體   English   中英

基於多個值合並 pandas dataframe 中的行

[英]Merging rows in a pandas dataframe based on mutiple values

這本質上與其他列匹配的 dataframe 的合並值有關,但由於這個問題已經得到回答,而且我沒有找到針對不同問題的正確修改,所以我打開了這個新線程。 希望沒關系。 對問題。 我有以下數據

 date              car_brand    color     city      stolen
 "2020-01-01"      porsche      red       paris     False
 "2020-01-01"      porsche      red       london    False
 "2020-01-01"      porsche      red       munich    False
 "2020-01-01"      porsche      red       madrid    False
 "2020-01-01"      porsche      red       rome      False
 "2020-01-01"      porsche      blue      berlin    False 
 "2020-01-01"      porsche      blue      tokyo     False
 "2020-01-01"      porsche      blue      peking    False
 "2020-01-01"      porsche      white     liverpool False 
 "2020-01-01"      porsche      white     oslo      False
 "2020-01-01"      porsche      white     barcelona False
 "2020-01-01"      porsche      white     miami     False
 "2020-01-02"      porsche      red       paris     False
 "2020-01-02"      porsche      red       london    False
 "2020-01-02"      porsche      red       munich    False
 "2020-01-02"      porsche      red       madrid    False
 "2020-01-02"      porsche      red       rome      False
 "2020-01-02"      porsche      blue      berlin    False
 "2020-01-02"      porsche      blue      tokyo     False
 "2020-01-02"      porsche      blue      peking    False
 "2020-01-02"      porsche      white     liverpool False 
 "2020-01-02"      porsche      white     oslo      False
 "2020-01-02"      porsche      white     barcelona False
 "2020-01-02"      porsche      white     miami     False 
 "2020-01-03"      porsche      red       paris     False
 "2020-01-03"      porsche      red       london    False
 "2020-01-03"      porsche      red       munich    False
 "2020-01-03"      porsche      red       madrid    True
 "2020-01-03"      porsche      red       rome      False
 "2020-01-03"      porsche      blue      berlin    False
 "2020-01-03"      porsche      blue      tokyo     False
 "2020-01-03"      porsche      blue      peking    False
 "2020-01-03"      porsche      white     liverpool False 
 "2020-01-03"      porsche      white     oslo      False
 "2020-01-03"      porsche      white     barcelona False 
 "2020-01-03"      porsche      white     miami     False 
 "2020-01-04"      porsche      red       paris     False
 "2020-01-04"      porsche      red       london    False
 "2020-01-04"      porsche      red       munich    False
 "2020-01-04"      porsche      red       madrid    False
 "2020-01-04"      porsche      red       rome      False 
 "2020-01-04"      porsche      blue      berlin    False
 "2020-01-04"      porsche      blue      tokyo     False
 "2020-01-04"      porsche      blue      peking    False 
 "2020-01-04"      porsche      white     liverpool False
 "2020-01-04"      porsche      white     oslo      False
 "2020-01-04"      porsche      white     barcelona False
 "2020-01-04"      porsche      white     miami     False

我知道如何根據以下方式創建 dataframe:如果連續幾天 boolean 與所有條目“被盜”匹配,那么我想合並日期列。 例如,在上面的示例中,boolean 條目匹配“2020-01-01”和“2020-01-02”。 所以總的來說,我想得到以下結果:

 date                             car_brand    color     city      stolen
 ["2020-01-01","2020-01-02"]      porsche      red       paris     False
 ["2020-01-01","2020-01-02"]      porsche      red       london    False
 ["2020-01-01","2020-01-02"]      porsche      red       munich    False
 ["2020-01-01","2020-01-02"]      porsche      red       madrid    False
 ["2020-01-01","2020-01-02"]      porsche      red       rome      False
 ["2020-01-01","2020-01-02"]      porsche      blue      berlin    False 
 ["2020-01-01","2020-01-02"]      porsche      blue      tokyo     False
 ["2020-01-01","2020-01-02"]      porsche      blue      peking    False
 ["2020-01-01","2020-01-02"]      porsche      white     liverpool False 
 ["2020-01-01","2020-01-02"]      porsche      white     oslo      False
 ["2020-01-01","2020-01-02"]      porsche      white     barcelona False
 ["2020-01-01","2020-01-02"]      porsche      white     miami     False
 ["2020-01-03"]                   porsche      red       paris     False
 ["2020-01-03"]                   porsche      red       london    False
 ["2020-01-03"]                   porsche      red       munich    False
 ["2020-01-03"]                   porsche      red       madrid    True
 ["2020-01-03"]                   porsche      red       rome      False
 ["2020-01-03"]                   porsche      blue      berlin    False
 ["2020-01-03"]                   porsche      blue      tokyo     False
 ["2020-01-03"]                   porsche      blue      peking    False
 ["2020-01-03"]                   porsche      white     liverpool False 
 ["2020-01-03"]                   porsche      white     oslo      False
 ["2020-01-03"]                   porsche      white     barcelona False 
 ["2020-01-03"]                   porsche      white     miami     False 
 ["2020-01-04"]                   porsche      red       paris     False
 ["2020-01-04"]                   porsche      red       london    False
 ["2020-01-04"]                   porsche      red       munich    False
 ["2020-01-04"]                   porsche      red       madrid    False
 ["2020-01-04"]                   porsche      red       rome      False 
 ["2020-01-04"]                   porsche      blue      berlin    False
 ["2020-01-04"]                   porsche      blue      tokyo     False
 ["2020-01-04"]                   porsche      blue      peking    False 
 ["2020-01-04"]                   porsche      white     liverpool False
 ["2020-01-04"]                   porsche      white     oslo      False
 ["2020-01-04"]                   porsche      white     barcelona False
 ["2020-01-04"]                   porsche      white     miami     False

為了簡短起見,代碼沒有從示例數據中構建 dataframe。

關鍵技術是一個新的列,隨着被盜日期的變化而變化。 值變化增量

df["date"] = pd.to_datetime(df["date"])

# require new group when there is a stolen car in any date
df2 = (df.groupby("date")["stolen"].max().to_frame()
 .reset_index()
 .assign(stolen_grp=lambda dfa: (dfa.stolen.diff() != 0).cumsum())
 .drop(columns="stolen")
)

# put stolen_grp back into dataframe
df = df.merge(df2, on="date")

# same technique, breaking on days a car has been stolen
(
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
    .drop(columns="stolen_grp")
)

樣品 output

car_brand color   city  stolen                                       date
  porsche  blue berlin   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue berlin   False                      [2020-01-03 00:00:00]
  porsche  blue berlin   False                      [2020-01-04 00:00:00]
  porsche  blue peking   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue peking   False                      [2020-01-03 00:00:00]
  porsche  blue peking   False                      [2020-01-04 00:00:00]
  porsche  blue  tokyo   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue  tokyo   False                      [2020-01-03 00:00:00]
  porsche  blue  tokyo   False                      [2020-01-04 00:00:00]
  porsche   red london   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM