簡體   English   中英

pandas 數據框 - 在特定條件下查找最長的連續行

[英]pandas dataframe - find longest consecutive rows with a certain condition

使用名為“df”的熊貓數據框如下

             A
2015-05-01  True
2015-05-02  True
2015-05-03  False
2015-05-04  False
2015-05-05  False
2015-05-06  False
2015-05-07  True
2015-05-08  False
2015-05-09  False

我想返回一個切片,它是最長連續行數,其中列 'A' 讀取為 'False'。 這能做到嗎?

您可以使用cumsum來檢測A列中的更改,因為可以對 Python 中的boolean求和。

# Test data
df= DataFrame([True, True, False, False, False, False, True, False, False], 
              index=pd.to_datetime(['2015-05-01', '2015-05-02', '2015-05-03',
                                   '2015-05-04', '2015-05-05', '2015-05-06',
                                   '2015-05-07', '2015-05-08', '2015-05-09']), 
              columns=['A'])

# We have to ensure that the index is sorted
df.sort_index(inplace=True)
# Resetting the index to create a column
df.reset_index(inplace=True)

# Grouping by the cumsum and counting the number of dates and getting their min and max
df = df.groupby(df['A'].cumsum()).agg(
    {'index': ['count', 'min', 'max']})

# Removing useless column level
df.columns = df.columns.droplevel()

print(df)
#    count        min        max
# A                             
# 1      1 2015-05-01 2015-05-01
# 2      5 2015-05-02 2015-05-06
# 3      3 2015-05-07 2015-05-09

# Getting the max
df[df['count']==df['count'].max()]

#    count        min        max
# A                             
# 2      5 2015-05-02 2015-05-06

很抱歉帶回舊帖子,但我注意到 Romain 的回答結果略有偏差 - 計數不正確,導致結果不准確。 計數列中應該有 4 個項目:[2, 4, 1, 2],最大值為 4。

為了演示這個問題 - 我已經把它分解了一點(df 與上面接受的答案相同)。 您可以看到結果組不正確:

# sort
dfS = df.sort_index(inplace=True)
# reset
dfSR = dfS.reset_index(inplace=True)
# group
dfG = dfSR.groupby(df['A'].cumsum())

# show resulting groups
for group in dfG: print(group)

# (1,        index     A
# 0 2015-05-01  True)
# (2,        index      A
# 1 2015-05-02   True
# 2 2015-05-03  False
# 3 2015-05-04  False
# 4 2015-05-05  False
# 5 2015-05-06  False)
# (3,        index      A
# 6 2015-05-07   True
# 7 2015-05-08  False
# 8 2015-05-09  False)

由於帝斯曼的答案在這里,當然羅曼的回答,結合兩個職位的技術得到了答案。 它們已經在它們來自的帖子中進行了解釋,因此我將其保留在下面的代碼中。

import pandas as pd

df = pd.DataFrame([True, True, False, False, False, False, True, False, False], 
              index=pd.to_datetime(['2015-05-01', '2015-05-02', '2015-05-03',
                                   '2015-05-04', '2015-05-05', '2015-05-06',
                                   '2015-05-07', '2015-05-08', '2015-05-09']), 
              columns=['A'])

df.sort_index(inplace=True)
df.reset_index(inplace=True)

dfBool = df['A'] != df['A'].shift()
dfCumsum = dfBool.cumsum()

groups = df.groupby(dfCumsum)

for g in groups: print(g)

groupCounts = groups.agg({'index':['count', 'min', 'max']})
groupCounts.columns = groupCounts.columns.droplevel()

print('\n', groupCounts, '\n')

maxCount = groupCounts[groupCounts['count'] == groupCounts['count'].max()]

print(maxCount, '\n')

輸出:

(1,        index     A
0 2015-05-01  True
1 2015-05-02  True)
(2,        index      A
2 2015-05-03  False
3 2015-05-04  False
4 2015-05-05  False
5 2015-05-06  False)
(3,        index     A
6 2015-05-07  True)
(4,        index      A
7 2015-05-08  False
8 2015-05-09  False)

    count        min        max
A                             
1      2 2015-05-01 2015-05-02
2      4 2015-05-03 2015-05-06
3      1 2015-05-07 2015-05-07
4      2 2015-05-08 2015-05-09 

   count        min        max
A                             
2      4 2015-05-03 2015-05-06

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM