简体   繁体   English

来自多个传感器的数据保存到导入到熊猫的 txt 文件中

[英]Data from multiple sensors saved to txt file imported to pandas

Good day everyone.今天是个好日子。

I was hoping someone here could help me with a bit of a problem.我希望这里有人可以帮助我解决一些问题。 I've run an experiment, where data has been gathered from 6 separate sensors simultaneously.我进行了一项实验,其中数据同时从 6 个独立的传感器收集。 The data has then been exported to a single shared txt file.然后将数据导出到一个共享的 txt 文件中。 Now I need to import the data to python to analyze it.现在我需要将数据导入python进行分析。

I know I can do this by taking each of the lines and simply copy&pasting data output from each sensor into a separate document, and then import those in a loop - but that is a lot of work and brings in a high potential of human error.我知道我可以通过获取每一行并简单地将每个传感器的数据输出复制并粘贴到一个单独的文档中,然后在循环中导入它们来做到这一点 - 但这是大量的工作并且会带来很大的人为错误。

But is there no way of using readline with specific lines read, and porting that to pandas DataFrame?但是有没有办法使用 readline 读取特定的行,并将其移植到 Pandas DataFrame? There is a fixed header spacing, and line spacing between each sensor.每个传感器之间有固定的头间距和行间距。

I tried:我试过:

f=open('OR0024622_auto3200.txt')
lines = f.readlines()

base = 83
sensorlines = 6400

Sensor=[]
Sensor = lines[base:sensorlines+base]

df_sens = pd.DataFrame(Sensor)
df_sens

but the output isn't very useful: Snip from of Output但输出不是很有用: Snip from of Output

-- Here's the file i am importing: link. -- 这是我要导入的文件:链接。

Any suggestions ?有什么建议 ?

Looks like a tab separated data.看起来像制表符分隔的数据。

use

>>> df = pd.read_csv('OR0024622_auto3200.txt', delimiter=r'\t', skiprows=83, header=None, nrows=38955-84)
>>> df.tail()
          0                   1              2
38686  6397   3.1980000000e+003   9.28819e-009
38687  6398   3.1985000000e+003   9.41507e-009
38688  6399   3.1990000000e+003   1.11703e-008
38689  6400   3.1995000000e+003   9.64276e-009
38690  6401   3.2000000000e+003   8.92203e-009
>>> df.head()
   0                   1              2
0  1   0.0000000000e+000   6.62579e+000
1  2   5.0000000000e-001   3.31289e+000
2  3   1.0000000000e+000   2.62362e-011
3  4   1.5000000000e+000   1.51130e-011
4  5   2.0000000000e+000   8.35723e-012

abhilb's answer is to the point and correct, but there is a lot to be said regarding loading/reading files. abhilb 的回答是中肯和正确的,但是关于加载/读取文件有很多话要说。 A quick browser search will take you a long way (I encourage you to read up on this!), but I'll add a few details here:快速浏览器搜索将带您走很长一段路(我鼓励您阅读此内容!),但我将在此处添加一些详细信息:

If you want to load multiple files that match a pattern you can do so iteratively via glob:如果要加载多个匹配模式的文件,可以通过 glob 迭代执行:

import pandas as pd
from glob import glob as gg
filePattern = "/path/to/file/*.txt"

for fileName in gg(filePattern):
    df = pd.read_csv('OR0024622_auto3200.txt', delimiter=r'\t')

This will load each file one-by-one.这将一个一个地加载每个文件。 What if you want to put all data into a single dataframe?如果要将所有数据放入单个数据框中怎么办? Do this:做这个:

masterDF = pd.Dataframe()

for fileName in gg(filePattern):
    df = pd.read_csv('OR0024622_auto3200.txt', delimiter=r'\t')
    masterDF = pd.concat([masterDF, df], axis=0)

This works great for pandas, but what if you want to read into a numpy array?这对 Pandas 很有用,但是如果你想读入一个 numpy 数组怎么办?

import numpy as np

# using previous imports
base = 83
sensorlines = 6400

# create an empty array that has three columns    
masterArray = np.full((0, 3), np.nan)

for fileName in gg(filePattern):
    # open the file (NOTE: this does not read the file, just puts it in a buffer)
    with open(fileName, "r") as tmp:
        # now read the file and split each line by the carriage return (could be "\r\n")
        # you now have a list of strings
        data = tmp.read().split("\n")

        # keep only the "data" portion of the file
        data = data[base:sensorlines + base]

        # convert list of strings to an array of floats
        # here, I use a "list comprehension" for speed and simplicity
        data = np.array([r.split("\t") for r in data]).astype(float)

        # stack your new data onto your master array
        masterArray = np.vstack([masterArray, data])

Opening a file via the "with open(fileName, "r")" syntax is handy because Python automatically closes the file when you are done.通过 "with open(fileName, "r")" 语法打开文件很方便,因为 Python 会在您完成后自动关闭文件。 If you don't use "with" then you must manually close the file (eg tmp.close()).如果您不使用“with”,那么您必须手动关闭文件(例如 tmp.close())。

These are just some starting points to get you on your way.这些只是帮助您继续前进的一些起点。 Feel free to ask for clarification.随时要求澄清。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM