基於多個值合並 pandas dataframe 中的行

Question

這本質上與其他列匹配的 dataframe 的合並值有關，但由於這個問題已經得到回答，而且我沒有找到針對不同問題的正確修改，所以我打開了這個新線程。 希望沒關系。 對問題。 我有以下數據

 date              car_brand    color     city      stolen
 "2020-01-01"      porsche      red       paris     False
 "2020-01-01"      porsche      red       london    False
 "2020-01-01"      porsche      red       munich    False
 "2020-01-01"      porsche      red       madrid    False
 "2020-01-01"      porsche      red       rome      False
 "2020-01-01"      porsche      blue      berlin    False 
 "2020-01-01"      porsche      blue      tokyo     False
 "2020-01-01"      porsche      blue      peking    False
 "2020-01-01"      porsche      white     liverpool False 
 "2020-01-01"      porsche      white     oslo      False
 "2020-01-01"      porsche      white     barcelona False
 "2020-01-01"      porsche      white     miami     False
 "2020-01-02"      porsche      red       paris     False
 "2020-01-02"      porsche      red       london    False
 "2020-01-02"      porsche      red       munich    False
 "2020-01-02"      porsche      red       madrid    False
 "2020-01-02"      porsche      red       rome      False
 "2020-01-02"      porsche      blue      berlin    False
 "2020-01-02"      porsche      blue      tokyo     False
 "2020-01-02"      porsche      blue      peking    False
 "2020-01-02"      porsche      white     liverpool False 
 "2020-01-02"      porsche      white     oslo      False
 "2020-01-02"      porsche      white     barcelona False
 "2020-01-02"      porsche      white     miami     False 
 "2020-01-03"      porsche      red       paris     False
 "2020-01-03"      porsche      red       london    False
 "2020-01-03"      porsche      red       munich    False
 "2020-01-03"      porsche      red       madrid    True
 "2020-01-03"      porsche      red       rome      False
 "2020-01-03"      porsche      blue      berlin    False
 "2020-01-03"      porsche      blue      tokyo     False
 "2020-01-03"      porsche      blue      peking    False
 "2020-01-03"      porsche      white     liverpool False 
 "2020-01-03"      porsche      white     oslo      False
 "2020-01-03"      porsche      white     barcelona False 
 "2020-01-03"      porsche      white     miami     False 
 "2020-01-04"      porsche      red       paris     False
 "2020-01-04"      porsche      red       london    False
 "2020-01-04"      porsche      red       munich    False
 "2020-01-04"      porsche      red       madrid    False
 "2020-01-04"      porsche      red       rome      False 
 "2020-01-04"      porsche      blue      berlin    False
 "2020-01-04"      porsche      blue      tokyo     False
 "2020-01-04"      porsche      blue      peking    False 
 "2020-01-04"      porsche      white     liverpool False
 "2020-01-04"      porsche      white     oslo      False
 "2020-01-04"      porsche      white     barcelona False
 "2020-01-04"      porsche      white     miami     False

我知道如何根據以下方式創建 dataframe：如果連續幾天 boolean 與所有條目“被盜”匹配，那么我想合並日期列。 例如，在上面的示例中，boolean 條目匹配“2020-01-01”和“2020-01-02”。 所以總的來說，我想得到以下結果：

 date                             car_brand    color     city      stolen
 ["2020-01-01","2020-01-02"]      porsche      red       paris     False
 ["2020-01-01","2020-01-02"]      porsche      red       london    False
 ["2020-01-01","2020-01-02"]      porsche      red       munich    False
 ["2020-01-01","2020-01-02"]      porsche      red       madrid    False
 ["2020-01-01","2020-01-02"]      porsche      red       rome      False
 ["2020-01-01","2020-01-02"]      porsche      blue      berlin    False 
 ["2020-01-01","2020-01-02"]      porsche      blue      tokyo     False
 ["2020-01-01","2020-01-02"]      porsche      blue      peking    False
 ["2020-01-01","2020-01-02"]      porsche      white     liverpool False 
 ["2020-01-01","2020-01-02"]      porsche      white     oslo      False
 ["2020-01-01","2020-01-02"]      porsche      white     barcelona False
 ["2020-01-01","2020-01-02"]      porsche      white     miami     False
 ["2020-01-03"]                   porsche      red       paris     False
 ["2020-01-03"]                   porsche      red       london    False
 ["2020-01-03"]                   porsche      red       munich    False
 ["2020-01-03"]                   porsche      red       madrid    True
 ["2020-01-03"]                   porsche      red       rome      False
 ["2020-01-03"]                   porsche      blue      berlin    False
 ["2020-01-03"]                   porsche      blue      tokyo     False
 ["2020-01-03"]                   porsche      blue      peking    False
 ["2020-01-03"]                   porsche      white     liverpool False 
 ["2020-01-03"]                   porsche      white     oslo      False
 ["2020-01-03"]                   porsche      white     barcelona False 
 ["2020-01-03"]                   porsche      white     miami     False 
 ["2020-01-04"]                   porsche      red       paris     False
 ["2020-01-04"]                   porsche      red       london    False
 ["2020-01-04"]                   porsche      red       munich    False
 ["2020-01-04"]                   porsche      red       madrid    False
 ["2020-01-04"]                   porsche      red       rome      False 
 ["2020-01-04"]                   porsche      blue      berlin    False
 ["2020-01-04"]                   porsche      blue      tokyo     False
 ["2020-01-04"]                   porsche      blue      peking    False 
 ["2020-01-04"]                   porsche      white     liverpool False
 ["2020-01-04"]                   porsche      white     oslo      False
 ["2020-01-04"]                   porsche      white     barcelona False
 ["2020-01-04"]                   porsche      white     miami     False

Answer 1

為了簡短起見，代碼沒有從示例數據中構建 dataframe。

關鍵技術是一個新的列，隨着被盜日期的變化而變化。 值變化增量

df["date"] = pd.to_datetime(df["date"])

# require new group when there is a stolen car in any date
df2 = (df.groupby("date")["stolen"].max().to_frame()
 .reset_index()
 .assign(stolen_grp=lambda dfa: (dfa.stolen.diff() != 0).cumsum())
 .drop(columns="stolen")
)

# put stolen_grp back into dataframe
df = df.merge(df2, on="date")

# same technique, breaking on days a car has been stolen
(
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
    .drop(columns="stolen_grp")
)

樣品 output

car_brand color   city  stolen                                       date
  porsche  blue berlin   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue berlin   False                      [2020-01-03 00:00:00]
  porsche  blue berlin   False                      [2020-01-04 00:00:00]
  porsche  blue peking   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue peking   False                      [2020-01-03 00:00:00]
  porsche  blue peking   False                      [2020-01-04 00:00:00]
  porsche  blue  tokyo   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]
  porsche  blue  tokyo   False                      [2020-01-03 00:00:00]
  porsche  blue  tokyo   False                      [2020-01-04 00:00:00]
  porsche   red london   False [2020-01-01 00:00:00, 2020-01-02 00:00:00]

基於多個值合並 pandas dataframe 中的行

問題描述

1 個解決方案

解決方案1
1 已采納 2021-01-13 17:04:48

樣品 output

基於多個值合並 pandas dataframe 中的行

問題描述

1 個解決方案

解決方案1 1 已采納 2021-01-13 17:04:48

樣品 output

解決方案1
1 已采納 2021-01-13 17:04:48