简体   繁体   English

使用 Python 连续读取和绘制 CSV 文件

[英]Continuously reading and plotting a CSV file using Python

this is my first time asking something here, so I hope I am asking the following question the "correct way".这是我第一次在这里提问,所以我希望我能以“正确的方式”问以下问题。 If not, please let me know, and I will give more information.如果没有,请告诉我,我会提供更多信息。

I am using one Python script, to read and write 4000Hz of serial data to a CSV file.我正在使用一个 Python 脚本,将 4000Hz 的串行数据读取和写入 CSV 文件。

The structure of the CSV file is as follows: (this example shows the beginning of the file) CSV文件的结构如下:(本例为文件开头)

Time of mSure Calibration: 24.10.2020 20:03:14.462654
Calibration Data - AICC: 833.95; AICERT: 2109; AVCC: 0.00; AVCERT: 0 
Sampling Frequency: 4000Hz
timestamp,instantaneousCurrentValue,instantaneousVoltageValue,activePowerValueCalculated,activePowerValue
24.10.2020 20:03:16.495828,-0.00032,7e-05,-0.0,0.0
24.10.2020 20:03:16.496078,0.001424,7e-05,0.0,0.0
24.10.2020 20:03:16.496328,9.6e-05,7e-05,0.0,0.0
24.10.2020 20:03:16.496578,-0.000912,7e-05,-0.0,0.0

Data will be written to this CSV as long as the script reading serial data is active.只要读取串行数据的脚本处于活动状态,数据就会写入此 CSV。 Thus, this might become a huge file at some time.因此,这在某个时候可能会变成一个巨大的文件。 (Data is written in chunks of 8000 rows = every two seconds) (数据以 8000 行的块写入 = 每两秒)

Here is my problem: I want to plot this data live.这是我的问题:我想实时绘制这些数据。 For example, update the plot each time data is written to the CSV file.例如,每次将数据写入 CSV 文件时更新绘图。 The plotting shall be done from another script than the script reading and writing the serial data.绘图应从另一个脚本完成,而不是读取和写入串行数据的脚本。

What is working: 1. Creating the CSV file.工作原理: 1. 创建 CSV 文件。 2. Plotting a finished CSV file using another script - actually pretty well :-) 2. 使用另一个脚本绘制完成的 CSV 文件 - 实际上非常好:-)

I have this script for plotting:我有这个用于绘图的脚本:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Data Computation Software for TeensyDAQ - Reads and computes CSV-File"""

# region imports
import getopt
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pathlib
from scipy.signal import argrelextrema
import sys
# endregion

# region globals
inputfile = ''
outputfile = ''
# endregion

# region functions
def main(argv):
    """Main application"""

    # region define variables
    global inputfile
    global outputfile
    inputfile = str(pathlib.Path(__file__).parent.absolute(
    ).resolve())+"\\noFilenameProvided.csv"
    outputfile = str(pathlib.Path(__file__).parent.absolute(
    ).resolve())+"\\noFilenameProvidedOut.csv"
    # endregion

    # region read system arguments
    try:
        opts, args = getopt.getopt(
            argv, "hi:o:", ["infile=", "outfile="])
    except getopt.GetoptError:
        print('dataComputation.py -i <inputfile> -o <outputfile>')
        sys.exit(2)
    for opt, arg in opts:
        if opt == '-h':
            print('dataComputation.py -i <inputfile> -o <outputfile>')
            sys.exit()
        elif opt in ("-i", "--infile"):
            inputfile = str(pathlib.Path(
                __file__).parent.absolute().resolve())+"\\"+arg
        elif opt in ("-o", "--outfile"):
            outputfile = str(pathlib.Path(
                __file__).parent.absolute().resolve())+"\\"+arg
    # endregion

    # region read csv
    colTypes = {'timestamp': 'str',
                'instantaneousCurrent': 'float',
                'instantaneousVoltage': 'float',
                'activePowerCalculated': 'float',
                'activePower': 'float',
                'apparentPower': 'float',
                'fundReactivePower': 'float'
                }
    cols = list(colTypes.keys())
    df = pd.read_csv(inputfile, usecols=cols, dtype=colTypes,
                     parse_dates=True, dayfirst=True, skiprows=3)
    df['timestamp'] = pd.to_datetime(
        df['timestamp'], utc=True, format='%d.%m.%Y %H:%M:%S.%f')
    df.insert(loc=0, column='tick', value=np.arange(len(df)))
    # endregion

    # region plot data
    fig, axes = plt.subplots(nrows=6, ncols=1,  sharex=True, figsize=(16,8))
    fig.canvas.set_window_title(df['timestamp'].iloc[0]) 
    fig.align_ylabels(axes[0:5])

    df['instantaneousCurrent'].plot(ax=axes[0], color='red'); axes[0].set_title('Momentanstrom'); axes[0].set_ylabel('A',rotation=0)
    df['instantaneousVoltage'].plot(ax=axes[1], color='blue'); axes[1].set_title('Momentanspannung'); axes[1].set_ylabel('V',rotation=0)
    df['activePowerCalculated'].plot(ax=axes[2], color='green'); axes[2].set_title('Momentanleistung ungefiltert'); axes[2].set_ylabel('W',rotation=0)
    df['activePower'].plot(ax=axes[3], color='brown'); axes[3].set_title('Momentanleistung'); axes[3].set_ylabel('W',rotation=0)
    df['apparentPower'].plot(ax=axes[4], color='brown'); axes[4].set_title('Scheinleistung'); axes[4].set_ylabel('VA',rotation=0)
    df['fundReactivePower'].plot(ax=axes[5], color='brown'); axes[5].set_title('Blindleitsung'); axes[5].set_ylabel('VAr',rotation=0); axes[5].set_xlabel('microseconds since start')
    
    plt.tight_layout()    
    plt.show()
    # endregion

# endregion


if __name__ == "__main__":
    main(sys.argv[1:])

My thoughts on how to solve my problem:我对如何解决我的问题的想法:

  1. Modify my plotting script to continuously read the CSV file and plot using the animation function of matplotlib.修改我的绘图脚本以连续读取 CSV 文件并使用 matplotlib 的动画功能绘图。
  2. Using some sort of streaming functionality to read the CSV in a stream.使用某种流功能来读取流中的 CSV。 I have read about the streamz library, but I have no idea how I could use it.我已经阅读了 streamz 库,但我不知道如何使用它。

Any help is highly appreciated!任何帮助表示高度赞赏!

Kind regards, Sascha亲切的问候,萨沙

EDIT 31.10.2020:编辑 31.10.2020:

Since I am not aware of the mean duration, how long to wait for help, I try to add more input, which maybe leads to helpful comments.由于我不知道平均持续时间,等待帮助需要多长时间,我尝试添加更多输入,这可能会导致有用的评论。

I wrote this script to write data continuously to a CSV file, which emulates my real script without the need for external hardware: (Random data is produced and CSV-formatted using a timer. Each time there are 50 new rows, the data is written to a CSV file)我写这个脚本是为了将数据连续写入一个 CSV 文件,它模拟了我的真实脚本,不需要外部硬件:(随机数据生成并使用计时器进行 CSV 格式。每次有 50 个新行时,数据被写入到 CSV 文件)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import csv
from random import randrange
import time
import threading
import pathlib
from datetime import datetime, timedelta

datarows = list()
datarowsToWrite = list()
outputfile = str(pathlib.Path(__file__).parent.absolute().resolve()) + "\\noFilenameProvided.csv"
sampleCount = 0

def startBatchWriteThread():
    global outputfile
    global datarows
    global datarowsToWrite
    datarowsToWrite.clear()
    datarowsToWrite = datarows[:]
    datarows.clear()
    thread = threading.Thread(target=batchWriteData,args=(outputfile, datarowsToWrite))
    thread.start()

def batchWriteData(file, data):
    print("Items to write: " + str(len(data)))
    with open(file, 'a+') as f:
        for item in data:
            f.write("%s\n" % item)

def generateDatarows():
    global sampleCount
    timer1 = threading.Timer(0.001, generateDatarows)
    timer1.daemon = True
    timer1.start()
    datarow = datetime.now().strftime("%d.%m.%Y %H:%M:%S.%f")[:] + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10)) + "," + str(randrange(10))
    datarows.append(datarow)
    sampleCount += 1

try:
    datarows.append("row 1")
    datarows.append("row 2")
    datarows.append("row 3")
    datarows.append("timestamp,instantaneousCurrent,instantaneousVoltage,activePowerCalculated,activePower,apparentPower,fundReactivePower")
    startBatchWriteThread()
    generateDatarows()
    while True:
        if len(datarows) == 50:
            startBatchWriteThread()
except KeyboardInterrupt:
    print("Shutting down, writing the rest of the buffer.")
    batchWriteData(outputfile, datarows)
    print("Done, writing " + outputfile)

The script from my initial post can then plot the data from the CSV file.我最初的帖子中的脚本然后可以绘制来自 CSV 文件的数据。

I need to plot the data as it is written to the CSV file to see the data more or less live.我需要在数据写入 CSV 文件时对其进行绘制,以查看或多或少的实时数据。

Hope this makes my problem more understandable.希望这能让我的问题更容易理解。

For the Googlers: I could not find a way to achieve my goal as described in the question.对于 Google 员工:我找不到实现问题中描述的目标的方法。

However, if you are trying to plot live data, coming with high speed over serial comms (4000Hz in my case), I recommend designing your application as a single program with multiple processes.但是,如果您尝试绘制实时数据,通过串行通信(在我的情况下为 4000Hz)高速传输,我建议将您的应用程序设计为具有多个进程的单个程序。

The problem in my special case was, that when I tried to plot and compute the incoming data simultaneously in the same thread/task/process/whatever, my serial receive rate went down to 100Hz instead of 4kHz.在我的特殊情况下的问题是,当我尝试在同一个线程/任务/进程/任何东西中同时绘制和计算传入数据时,我的串行接收速率下降到 100Hz 而不是 4kHz。 The solution with multiprocessing and passing data using the quick_queue module between the processes I could resolve the problem.在进程之间使用 quick_queue 模块进行多处理和传递数据的解决方案我可以解决这个问题。

I ended up, having a program, which receives data from a Teensy via serial communication at 4kHz, this incoming data was buffered to blocks of 4000 samples and then the data was pushed to the plotting process and additionally, the block was written to a CSV-file in a separate Thread.我最终得到了一个程序,它通过 4kHz 的串行通信从 Teensy 接收数据,此传入数据被缓冲到 4000 个样本块,然后将数据推送到绘图过程,此外,该块被写入 CSV -file 在一个单独的线程中。

Best, S最好的,S

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM