重新格式化文本文件，以便它可以在 python 中使用 numpy 數組？

Question

我有一小段代碼用於從數據集中查找置信區間。

from scipy import stats
import numpy as np

a = np.loadtxt("test1.txt")
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
    scale=sigma)

print(conf_int)

但是，我的文本文件（test1.txt）是一個數字列表，a）在開始和結束時有一個方括號 b）不在相等的列中。

"[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586]

np.loadtxt 似乎真的不喜歡這樣，所以有什么辦法可以使用 function 來讀取和使用數據或重新格式化它？

任何幫助將不勝感激！

更新所以我設法用下面的代碼刪除我的括號

with open('test1.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")


with open('clean.txt', 'w') as my_file:
my_file.write(text)


a = np.loadtxt("clean.txt")
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
   scale=sigma)

print(conf_int)

只需要重新格式化 clean.txt，使其現在在一列中，這樣 np.array 就會識別它。

最終更新

我設法讓它工作，使用@David Hoffman 建議的代碼和我從上面的長期工作； 見下文

from scipy import stats
import numpy as np

with open('test1.txt', 'r') as my_file:
    text = my_file.read()
    text = text.replace("[", "")
    text = text.replace("]", "")


with open('clean.txt', 'w') as my_file:
    my_file.write(text)


a = np.array(list(map(float, text.strip("[]").split())))
mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean,
   scale=sigma)

print(conf_int)

感謝大家花時間提供幫助，非常感謝，特別是對於像我這樣的新編碼員。

Answer 1

您可以將其讀取為字符串，然后將空格替換為,使其類似於列表並使用eval將字符串列表轉換為list類型，最后轉換為 numpy 數組。
對於您給定的虛擬輸入

li = """[-10.197663 -22.970129 -15.678419 -15.306197
-12.09961 -11.845362 -18.10553 -25.370747
-19.34831 -22.45586]"""

np.array(eval(li.replace(' ',',')))
array([-10.197663, -22.970129, -15.678419, -27.405807, -11.845362,
       -18.10553 , -44.719057, -22.45586 ])

對於給定的輸入文件 -這里的解決方案是

import re
li = open('test1.txt', 'r').read()

np.array(eval(re.sub(r'(\n +| +)',',',li)))
array([-10.197663  , -22.970129  , -15.678419  , -15.306197  ,
        -0.38851437, -12.09961   , -11.845362  , -18.10553   ,
       -25.370747  , -20.575884  , -19.34831   , -22.45586   ,
       -31.209     , -19.68507   , -31.07194   , -28.4792    ,
        ...])

Answer 2

這就是我要做的：

import numpy as np
from scipy import stats
import requests

link = "https://pastebin.pl/view/raw/929f5228"

response = requests.get(link)
text = response.text

# with open("test1.txt", "r") as my_file:
#     text = my_file.read()

a = np.array(list(map(float, text.strip("[]").split())))

mean, sigma = np.mean(a), np.std(a)

conf_int = stats.norm.interval(0.95, loc=mean, scale=sigma)

print(conf_int)

如果您有文件，則注釋行。

字符串處理線包含很多內容：

文本字符串被清理（刪除括號）
干凈的文本被空白分割（任何長度的連續空白字符都被視為分隔符）
每個拆分令牌都轉換為float （這是map部分）
map生成器轉換為列表並傳遞給numpy數組 function

正如@Dishin 所說，您的輸入文件的格式有些奇怪。 如果您可以控制文件的寫入方式（例如通過 LabVIEW 程序或其他 Python 腳本），則可能值得將數據格式化為更廣泛接受的格式，如np.loadtxt以便像.csv這樣的函數（或像 Excel 這樣的程序) 可以更輕松地閱讀它。

如果你堅持使用這些文件，你可以制作一個小實用程序 function ，例如：

def loader(filename):
    with open(filename, "r") as my_file:
        text = my_file.read()

    return np.array(list(map(float, text.strip("[]").split())))

在您的腳本中重用。

重新格式化文本文件，以便它可以在 python 中使用 numpy 數組？

問題描述

2 個解決方案

解決方案1
1 2020-08-07 16:05:05

解決方案2
1 已采納 2020-08-08 18:42:00

重新格式化文本文件，以便它可以在 python 中使用 numpy 數組？

問題描述

2 個解決方案

解決方案1 1 2020-08-07 16:05:05

解決方案2 1 已采納 2020-08-08 18:42:00

解決方案1
1 2020-08-07 16:05:05

解決方案2
1 已采納 2020-08-08 18:42:00