![](/img/trans.png)
[英]Pandas DataFrame: resampling along integer index / grouping by groups of n elements
[英]Resampling, grouping, pivoting a pandas dataframe
我有一個帶有時間戳和兩列的日志文件。 我現在想重新采樣和“透視”從日志文件創建的日期框架。
示例orig數據框/日志文件:
timestamp colA colB
2015-01-01 00:10:01 a x
2014-01-01 00:10:01 b y
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a y
2015-01-01 00:10:04 b x
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:07 a y
2014-01-01 00:10:08 a x
按秒重采樣的示例結果:
a b
timestamp x y x y
2015-01-01 00:10:01 1 0 0 1
2015-01-01 00:10:02 0 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 1 0 1 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:06 0 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
我將如何實現? 首先重新采樣,然后再分組/透視? 還是相反? 為了更具體一點,對於每個特定的重采樣時間間隔,單元格應包含colA / colB組合的計數。 在示例中為秒,但可以為分鍾,小時等。
我對這種格式不固定,我也可以考慮得到重新采樣的結果,並按timestamp / colA進行分組,例如
colB
timestamp colA x y
2015-01-01 00:10:01 a 1 0
b 0 1
2015-01-01 00:10:02 a 0 0
b 0 0
2015-01-01 00:10:03 a 2 1
b 0 0
2015-01-01 00:10:04 a 1 0
b 1 2
2014-01-01 00:10:05 a 2 0
b 0 0
2014-01-01 00:10:06 a 0 0
b 0 0
2014-01-01 00:10:07 a 0 1
b 0 0
2014-01-01 00:10:08 a 1 0
b 0 0
最終用途是繪制不同的計數值
謝謝。
您可以使用pd.crosstab
:
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}', parse_dates=[0])
table = pd.crosstab(index=[df['timestamp']], columns=[df['colA'], df['colB']])
產量
colA a b
colB x y x y
timestamp
2014-01-01 00:10:01 0 0 0 1
2014-01-01 00:10:04 1 0 0 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
2015-01-01 00:10:01 1 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 0 0 1 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.