简体   繁体   中英

Resampling Hz in a pandas dataframe

I'm working on a project in pandas on python. I receive as input a .csv file like this:

Name   Timestamp       Data
A1       259           [1.1,1.0,0.1]
A1       260           [-0.1,1.2,0.3]
A1       261           [0.1,0.2,-0.3]
...
A1       14895         [1.4,0.3,1.8]
...      
A2       278           [-1.1,1.2,0.4]
A2       353           [-0.1,1.2,0.3]
A2       409           [-0.1,1.2,0.3]
...
A2       14900         [-0.1,1.2,0.3]
...
A1140    107           [-0.5,-1.0,-1.0]
A1140    107           [0.6,0.1,0.3]
A1140    114           [-1.1,-1.2,0.3] 
... 
A1140    14995         [-1,1.2,0.4]

I've 1140+ names and hundreds/thousands of data for each name. Data was recorded at 200 Hz and I think that the timestamp numbers indicates milliseconds, though I' m not sure, i don't have access to this information. I've to resample to 50 Hz frequency.

How can I do this? Do I need to convert Timestamp into actual seconds and then use the .resample() function with 0.25s? And should i use a .groupby["Name"] function? Thank you in advance!

I cannot answer the question exactly in its entirety as not even you are sure about the timestamp, but I will try to give you some general guidelines.
What you have here is called panel data , many different time series for each "name".
groupby(['Name']).apply(<func>) can indeed be a useful method, as it allows for manipulation of each of the different names separately, allowing you to work with the simpler data type of a time series .
A time series is data of the type:

Date                  Value
2000-01-01 00:00:00   3
2000-01-01 00:03:00   12
2000-01-01 00:06:00   21

As you can see, the time period in which each sample is taken, is 3 minutes. We could call resample() and convert it to 10 minutes like this:

series.resample('10T').mean()

Note that instead of mean you could use .apply(<func>) to choose the downsampling method. For more info on the frequency, consider this question.


To conclude, your best bet would be to try and find out what exactly is timestamp, convert it to a DateTime and then use either

 df.groupby(['Name']).resample('20L').mean() 

or with a for loop iterate through each name and use resample to each series individually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM