簡體   English   中英

Pandas - Groupby 在組內查找組

[英]Pandas - Groupby find groups within groups

這是我開始的這個鏈接的轉帖,但我意識到問題要復雜得多。

df = pd.DataFrame({'a': ['A1', 'A1', 'A1', 'A2', 'A2','A3','A3', 'A4', 'A3', 'A2', "A4", "A4", "A4"],
                   'value': ["7:00","10:00","20:00","9:00","7:00","9:00","8:00","15:00","19:00", "9:30", "15:30", "16:00", "16:30"],
                   "value2": [3,1,2,4,2,3,3,5,3,2,1,5,7],
                   'value3': ["Apple", "Orange", "Apple", "Kiwi", "Orange", "Orange", "Apple", "Apple", "Apple", "Apple", "Orange", "Orange","Apple"],
                  "value4": ["Throw", "Eat", 'Throw', "Keep", "Eat", "Eat", "Throw", "Throw", "Throw", "Throw", "Eat", "Eat", "Chuck"]})

我想要的是:1)通過ID(變量“a”),選擇“value3”下的所有實例,其中它是“orange”,然后是“apple”。 他們不必背靠背; 這兩者之間可以有許多其他值。 但橙子必須及時趕在蘋果之前。

2)然后將這些橙子和蘋果的實例分為兩組: 1)第一組是當 value2 = 1 時橙色; 2) 是當橙色不等於 1 時(因此其余的歸為一組)。 問題是 A4,其中有兩個橙子 - 1 和 5。這應該歸入組 value2 = 1 中,因為它首先發生。

更新:對不起 - 我預期的回復似乎沒有剪切和粘貼:

value2     value3     count
1          orange     2
all other  orange     2

看看這是否有效,但是我會看看其他人是否可以給你一個簡單而簡短的版本,

    df1 = df[['a','value3']].drop_duplicates()
##Merging the dataframes
merge =df1.merge(df,how = 'left',left_index=True, right_index=True)
##Selecting the only requried columns
merge = merge[['value2','value3_x']]
##Renaming the columns
merge = merge.rename(columns={'value3_x':'value3'})
##Filtering the data
merge = merge[merge.value3=='Orange']
##Converting te value to string
merge['value2']= df.value2.astype(str) 
## Changing the value of value2
merge['value2'] = merge.value2.apply(lambda x: '1' if x == '1' else 'all other')
##Grouping the data
merge.groupby(['value2','value3']).value3.count()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM