简体   繁体   English

重新格式化文本文件,以便它可以在 python 中使用 numpy 数组?

[英]Reformat a text file so it can be used a numpy array, in python?

I have a small chunk of code that i'm using to find the confidence interval from a data set.我有一小段代码用于从数据集中查找置信区间。

from scipy import stats
import numpy as np

a = np.loadtxt("test1.txt")
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
    scale=sigma)

print(conf_int)

However, my text file (test1.txt) is a list of numbers that a) has a square bracket at the start and finish b)is not in equal columns.但是,我的文本文件(test1.txt)是一个数字列表,a)在开始和结束时有一个方括号 b)不在相等的列中。

"[-10.197663 -22.970129 -15.678419 -15.306197 "[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747 -12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586] -19.34831 -22.45586]

np.loadtxt really doesn't seem to like this, so is there any way i can use a function to either read and use the data as is or reformat it? np.loadtxt 似乎真的不喜欢这样,所以有什么办法可以使用 function 来读取和使用数据或重新格式化它?

Any help would be greatly appreciated!任何帮助将不胜感激!

Update so i manged to remove my brackets with the code below更新所以我设法用下面的代码删除我的括号

with open('test1.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")


with open('clean.txt', 'w') as my_file:
my_file.write(text)


a = np.loadtxt("clean.txt")
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
   scale=sigma)

print(conf_int)

Just need to reformat clean.txt so its in one single column now so np.array will recognise it.只需要重新格式化 clean.txt,使其现在在一列中,这样 np.array 就会识别它。

Final update最终更新

I managed to get it working, using @David Hoffman suggested code and my long work around from above;我设法让它工作,使用@David Hoffman 建议的代码和我从上面的长期工作; see below见下文

from scipy import stats
import numpy as np

with open('test1.txt', 'r') as my_file:
    text = my_file.read()
    text = text.replace("[", "")
    text = text.replace("]", "")


with open('clean.txt', 'w') as my_file:
    my_file.write(text)


a = np.array(list(map(float, text.strip("[]").split())))
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
   scale=sigma)

print(conf_int)

Thank you to everyone for taking the time to help, it was very much appreciated, especially to a new coder like me.感谢大家花时间提供帮助,非常感谢,特别是对于像我这样的新编码员。

You can read it as string then replace space with , to make it like list and use eval to convert string list to list type and at last to numpy array.您可以将其读取为字符串,然后将空格替换为,使其类似于列表并使用eval将字符串列表转换为list类型,最后转换为 numpy 数组。
For your given dummy input对于您给定的虚拟输入

li = """[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586]"""

np.array(eval(li.replace(' ',',')))
array([-10.197663, -22.970129, -15.678419, -27.405807, -11.845362,
       -18.10553 , -44.719057, -22.45586 ])

For given input file - here solution would be对于给定的输入文件 -这里的解决方案是

import re
li = open('test1.txt', 'r').read()

np.array(eval(re.sub(r'(\n +| +)',',',li)))
array([-10.197663  , -22.970129  , -15.678419  , -15.306197  ,
        -0.38851437, -12.09961   , -11.845362  , -18.10553   ,
       -25.370747  , -20.575884  , -19.34831   , -22.45586   ,
       -31.209     , -19.68507   , -31.07194   , -28.4792    ,
        ...])

This is what I would do:这就是我要做的:

import numpy as np
from scipy import stats
import requests

link = "https://pastebin.pl/view/raw/929f5228"

response = requests.get(link)
text = response.text

# with open("test1.txt", "r") as my_file:
#     text = my_file.read()

a = np.array(list(map(float, text.strip("[]").split())))

mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean, scale=sigma)

print(conf_int)

The commented lines are for if you have a file.如果您有文件,则注释行。

There's a lot packed into the string handling line:字符串处理线包含很多内容:

  1. The text string is cleaned (removing brackets)文本字符串被清理(删除括号)
  2. The clean text is split by white space (any length of consecutive whitespace characters are treated as delimiters)干净的文本被空白分割(任何长度的连续空白字符都被视为分隔符)
  3. Each split token is converted to a float (this is the map part)每个拆分令牌都转换为float (这是map部分)
  4. The map generator is converted to a list and passed to the numpy array function map生成器转换为列表并传递给numpy数组 function

As @Dishin said, there's some weirdness with how your input file is formatted.正如@Dishin 所说,您的输入文件的格式有些奇怪。 If you have any control over how the file is written (say from a LabVIEW program or other Python script) it might be worth formatting the data in a more widely accepted format like .csv so that functions like np.loadtxt (or programs like Excel) can read it more easily.如果您可以控制文件的写入方式(例如通过 LabVIEW 程序或其他 Python 脚本),则可能值得将数据格式化为更广泛接受的格式,如np.loadtxt以便像.csv这样的函数(或像 Excel 这样的程序) 可以更轻松地阅读它。

If you're stuck with the files as is you can just make a little utility function like:如果你坚持使用这些文件,你可以制作一个小实用程序 function ,例如:

def loader(filename):
    with open(filename, "r") as my_file:
        text = my_file.read()

    return np.array(list(map(float, text.strip("[]").split())))

to reuse in your scripts.在您的脚本中重用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM