如何从python中的.txt文件中的时间序列数据创建可视化

Question

I have a .txt file with three columns: Time, ticker, price.我有一个包含三列的 .txt 文件：时间、股票代码、价格。 The time is spaced in 15 second intervals.时间间隔为 15 秒。 It looks like this uploaded to jupyter notebook and put into a Pandas DF.看起来这个上传到 jupyter notebook 并放入 Pandas DF。

time          ticker price
0   09:30:35    EV  33.860
1   00:00:00    AMG 60.430
2   09:30:35    AMG 60.750
3   00:00:00    BLK 455.350
4   09:30:35    BLK 451.514
 ...    ... ... ...
502596  13:00:55    TLT 166.450
502597  13:00:55    VXX 47.150
502598  13:00:55    TSLA    529.800
502599  13:00:55    BIDU    103.500
502600  13:00:55    ON  12.700

# NOTE: the first set of data has the data at market open for -
# every other time point, so that's what the 00:00:00 is. 
#It is only limited to the 09:30:35 data.

I need to create a function that takes an input (a ticker) and then creates a bar chart that displays the data with 5 minute ticks ( the data is every 20 seconds, so for every 15 points in time).我需要创建一个函数，它接受一个输入（股票代码），然后创建一个条形图，以 5 分钟的时间刻度显示数据（数据是每 20 秒一次，所以每 15 个时间点）。

So far I've thought about separating the "mm" part of the hh:mm:ss to just get the minutes in another column and then right a for loop that looks something like this:到目前为止，我已经考虑过将 hh:mm:ss 的“mm”部分分开，以获取另一列中的分钟数，然后正确使用一个看起来像这样的 for 循环：

for num in df['mm']:
    if num %5 == 0:
       print('tick')

then somehow appending the "tick" to the "time" column for every 5 minutes of data (I'm not sure how I would do this), then using the time column as the index and only using data with the "tick" index in it (some kind of if statement).然后以某种方式为每 5 分钟的数据将“刻度”附加到“时间”列（我不确定我将如何执行此操作），然后使用时间列作为索引并且仅使用带有“刻度”索引的数据在其中（某种 if 语句）。 I'm not sure if this makes sense but I'm drawing a blank on this.我不确定这是否有意义，但我对此空白。

Answer 1

You should have a look at the built-in functions in pandas.您应该看看 pandas 中的内置函数。 In the following example I'm using a date + time format but it shouldn't be hard to convert one to the other.在以下示例中，我使用的是日期 + 时间格式，但将一种格式转换为另一种格式应该不难。

Generate data生成数据

%matplotlib inline
import pandas as pd
import numpy as np

dates = pd.date_range(start="2020-04-01", periods=150, freq="20S")
df1 = pd.DataFrame({"date":dates,
                    "price":np.random.rand(len(dates))})
df2 = df1.copy()
df1["ticker"] = "a"
df2["ticker"] = "b"

df =  pd.concat([df1,df2], ignore_index=True)
df = df.sample(frac=1).reset_index(drop=True)

Resample Timeseries every 5 minutes每 5 分钟重新采样一次时间序列

Here you can try to see the output of在这里您可以尝试查看输出

df1.set_index("date")\
   .resample("5T")\
   .first()\
   .reset_index()

Where we are considering just the first element at 05:00 , 10:00 and so on.我们只考虑05:00 、 10:00等的第一个元素。 In general to do the same for every ticker we need a groupby一般来说，我们需要一个groupby对每个股票做同样的事情

out = df.groupby("ticker")\
        .apply(lambda x: x.set_index("date")\
                          .resample("5T")\
                          .first()\
                          .reset_index())\
        .reset_index(drop=True)

Plot function绘图函数

def plot_tick(data, ticker):
    ts = data[data["ticker"]==ticker].reset_index(drop=True)
    ts.plot(x="date", y="price", kind="bar", title=ticker);

plot_tick(out, "a")

Then you can improve the plot or, eventually, try to use plotly .然后你可以改进情节，或者最终尝试使用plotly 。

如何从python中的.txt文件中的时间序列数据创建可视化

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-01 01:10:56

Generate data生成数据

Resample Timeseries every 5 minutes每 5 分钟重新采样一次时间序列

Plot function绘图函数

如何从python中的.txt文件中的时间序列数据创建可视化

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-01 01:10:56

Generate data生成数据

Resample Timeseries every 5 minutes每 5 分钟重新采样一次时间序列

Plot function绘图函数

解决方案1
1 已采纳 2020-04-01 01:10:56