[英]How does binning work in pandas dataframe and how can I classify my dataset based on percentiles in Python?
[英]how do I classify or regroup dataset based on time variation in python
我需要為每小時不同時間之間的值分配數字。 然后我如何向其中添加一個新列,我可以在其中指定要每小時分組的每個單元格。 比如00:00:00到00:59:59的交易都填1,01:00:00到01:59:59的交易填2,以此類推到23:00 :00 到 23:59:59 填充 24
Time_duration = df['period']
print (Time_duration)
0 23:59:56
1 23:59:56
2 23:59:55
3 23:59:53
4 23:59:52
...
74187 00:00:18
74188 00:00:09
74189 00:00:08
74190 00:00:03
74191 00:00:02 ```
# this is the result I desire.... How can I then add a new column to this where I can specify each cell to be grouped hourly. for instance, all the transactions within 00:00:00 to 00:59:59 to be filled with 1, transactions within 01:00:00 to 01:59:59 to be filled with 2, and so on till 23:00:00 to 23:59:59 to be filled with 24.
0 23:59:56 24
1 23:59:56 24
2 23:59:55 24
3 23:59:53 24
4 23:59:52 24
...
74187 00:00:18 1
74188 00:00:09 1
74189 00:00:08 1
74190 00:00:03 1
74191 00:00:02 1
您可以使用正則表達式和str.extract
import pandas as pd
pattern= r'^(\d{1,2}):' #capture the digits of the hour
df['hour']=df['period'].str.extract(pattern).astype('int') + 1 # cast it as int so that you can add 1
df.sort_values(by=["period"])
timeStamp_list = (pd.to_datetime(list(df['period'])))
df['Hour'] =timeStamp_list.hour
試試這個代碼,這對我有用。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.