[英]How to groupby a column and count the values on condition using python pandas?
[英]How to groupby splitted text and count their count using python pandas?
輸入:
import pandas as pd
df=pd.DataFrame({
'Station':['001ABC006','002ABD008','005ABX009','007ABY010','001ABC006','002ABD008'],
'Trains Passing':[55,56,59,96,95,96],
'Destination':['MRK','MRK','MRS','MTS','KPS','KPS']
})
我需要將“ Station”文本從“ 001ABC006”拆分為“ ABC”並創建一個列表。 僅計算列表中存在的值。 還要按目的地分組。 我該怎么辦?
輸出:
StationId ABC ABD ABX ABY
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
KPS 1 1 0 0
更新
In [180]: pd.crosstab(df.Destination, df.Station.str[3:6])
Out[180]:
Station ABC ABD ABX ABY
Destination
KPS 1 1 0 0
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
你可以用
In [160]: pd.DataFrame([df.Station.str[3:6].value_counts().to_dict()])
Out[160]:
ABC ABD ABX ABY
0 2 2 1 1
要么,
In [149]: df.Station.str[3:6].value_counts().to_frame().T
Out[149]:
ABC ABD ABX ABY
Station 2 2 1 1
細節
In [162]: df.Station.str[3:6]
Out[162]:
0 ABC
1 ABD
2 ABX
3 ABY
4 ABC
5 ABD
Name: Station, dtype: object
In [163]: df.Station.str[3:6].value_counts()
Out[163]:
ABC 2
ABD 2
ABX 1
ABY 1
Name: Station, dtype: int64
這稱為交叉列表,下面的鏈接顯示了幾種方法。
請參閱 : 如何旋轉數據框
crosstab
pd.crosstab(df.Destination, df.Station.str.replace('\d', ''))
Station ABC ABD ABX ABY
Destination
KPS 1 1 0 0
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
df.Station.str.replace('\d', '').value_counts()
ABC 2
ABD 2
ABY 1
ABX 1
Name: Station, dtype: int64
findall
import pandas as pd
import numpy as np
import re
i, r = pd.factorize(re.findall('(?i)([a-z]+)', '|'.join(df.Station)))
pd.Series(np.bincount(i), r)
ABC 2
ABD 2
ABX 1
ABY 1
dtype: int64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.