I have a data looks something like this
Start Time End Time Trip Duration Start Station End Station
01/01/17 15:09 01/01/17 15:14 321 A B
01/02/17 15:09 01/02/17 15:14 321 C D
12/03/17 15:09 12/03/17 15:14 321 E F
05/01/17 15:09 05/01/17 15:14 321 B D
17/02/17 15:09 17/02/17 15:14 321 A B
12/04/17 15:09 12/04/17 15:14 321 E H
13/05/17 15:09 13/05/17 15:14 321 S K
17/01/17 15:09 17/01/17 15:14 321 A B
Using the following code, I am able to find the most common start station
start_station = filtered['Start Station'].mode()[0]
I need to find the most common trip, ie where a pair of start station and end station are same. According to the above data, the most common trip should be b/w A and B
Can anyone please tell me how to find a common trip
Use GroupBy.size
with nlargest
or sort_values
with iloc
for select last value.
Function remove_unused_levels
is used for remove MultiIndex values by removed values of Series
.
a = (df.groupby(['Start Station','End Station'])
.size()
.nlargest(1)
.index.remove_unused_levels()
.tolist()
)
Or:
a = (df.groupby(['Start Station','End Station'])
.size()
.sort_values()
.iloc[[-1]]
.index.remove_unused_levels()
.tolist()
)
print(a)
[('A', 'B')]
If want output DataFrame
:
df1 = (df.groupby(['Start Station','End Station'])
.size()
.reset_index(name='count')
.nlargest(1, 'count')[['Start Station','End Station']]
)
print (df1)
Start Station End Station
0 A B
You need count? Then try this:
df = pd.DataFrame({'Start':['A','B','C','D','A'],'End':['B']*5,'Trip Duration':[321]*5})
df.groupby(['Start','End'])['Trip Duration'].count().sort_values(ascending=False, na_position='first')
I might do this
trip = (filtered["Start Station"] + " -> " + filtered["End Station"]).mode()
# A -> B
Have a look at this Groupby Split apply combine
This should give you a wide range of aggregation functions.
using groupby:
import pandas as pd
counts = df.groupby(["Start_Station","End_Station"]).count()
print(counts)
Start_Time End_Time Trip_Duration trip_id
Start_Station End_Station
A B 3 3 3 3
B D 1 1 1 1
C D 1 1 1 1
E F 1 1 1 1
H 1 1 1 1
S K 1 1 1 1
using value_counts and a dummy column:
import pandas as pd
df["trip_id"] = df.Start_Station + df.End_Station
counts = df["trip_id"].value_counts()
print(counts)
AB 3
BD 1
EH 1
SK 1
EF 1
CD 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.