简体   繁体   English

以前工作的 python 脚本现在中途停止

[英]Previously working python script now stops halfway through

I've been working on a python script to do Probit analysis to find the Lower Limit of Detection (LLoD) for assays run in our lab.我一直在编写 python 脚本来进行概率分析,以找到在我们实验室运行的检测的下限 (LLoD)。 I had a script that worked perfectly last week, but was messy and lacked any kind of input-checking to make sure the user's input was valid.上周我有一个运行良好的脚本,但是很混乱,并且缺少任何类型的输入检查来确保用户的输入是有效的。

On running the script, the user was prompted for a few questions (the column headers for the relevant data in the.csv file containing the data to analyze, and whether or not to use the data in said columns as-is or in log_10 form).在运行脚本时,系统会提示用户几个问题(包含要分析的数据的 .csv 文件中相关数据的列标题,以及是否按原样或 log_10 形式使用所述列中的数据)。 The script would then -perform the necessary data cleanup and calculations -print out a table of the relevant data along with the equation and R^2 value for the linear regression -display a graph of the data along with the linear regression and calculated LLoD -print out "The lower limit of detection at 95% CI is [whatever]".然后,该脚本将 - 执行必要的数据清理和计算 - 打印出相关数据表以及线性回归的方程和 R^2 值 - 显示数据图表以及线性回归和计算的 LLoD -打印出“95% CI 的检测下限是 [whatever]”。

Now, on running the script, the program stops after printing the data table and displaying the graph (the regression equation and R^2 value are not printed, nor is anything afterwards).现在,在运行脚本时,程序在打印数据表并显示图形后停止(不打印回归方程和 R^2 值,之后也不打印任何内容)。 Furthermore, python doesn't return to prompt for input with the standard >>> , and I have to exit and reopen Python.此外,python 不会返回提示输入标准>>> ,我必须退出并重新打开 Python。 Does anyone have any idea what's going on?有谁知道发生了什么? Full code at the bottom of post, sample data can be found here .完整代码在帖子底部,示例数据可以在这里找到。 Note: This is the exact data I've been using, which was working last week.注意:这是我一直在使用的确切数据,上周有效。

(PS any misc. tips for cleaning up the code would be appreciated as well, so long as the function is unchanged. I come from a C background and adapting to Python is still a work in progress...) (PS 任何清理代码的其他提示也将不胜感激,只要 function 不变。我来自 C 背景并适应 ZA7F5F35426B9274117FC9231B563 仍在进行中...)

Code: (FWIW I'm running 3.8)代码:(FWIW我正在运行3.8)

import os
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.stats import norm
from numpy.polynomial import Polynomial
from tkinter import filedialog
from tkinter import *

# Initialize tkinter
root = Tk()
root.withdraw()

# Prompt user for data file and column headers, ask whether to use log(qty)
print("In the directory prompt, select the .csv file containing data for analysis")
path = filedialog.askopenfilename()
#data = pd.read_csv(path, usecols=[conc, detect])
data = pd.read_csv(path)


while True:
    conc = input("Enter the column header for concentration/number of copies: ")
    if conc in data.columns:
        break
    else:
        print('Invalid input. Column \''+str(conc)+'\' does not exist. Try again')
        continue

while True:
    detect = input("Enter the column header for target detection: ")
    if detect in data.columns:
        break
    else:
        print('Invalid input. Column \''+str(detect)+'\' does not exist. Try again')
        continue

while True:
    logans = input("Analyze using log10(concentration/number of copies)? (y/n): ")
    if logans == 'y':
        break
    elif logans == 'n':
        break
    else:
        print('Invalid input. Please enter either y or n.')
        continue

# Read the columns of data specified by the user and rename them for consistency
data = data.rename(columns={conc:"qty", detect:"result"})

# Create list of unique values for RNA quantity, initialize vectors of same length
# to store probabilies and probit scores for each
qtys = data['qty'].unique()
log_qtys = [0] * len(qtys)
prop = [0] * len(qtys)
probit = [0] * len(qtys)

# Function to get the hitrate/probability of detection for a given quantity
# Note: any values in df.result that cannot be parsed as a number will be converted to NaN
def hitrate(qty, df):
    t_s = df[df.qty == qty].result
    t_s = t_s.apply(pd.to_numeric, args=('coerce',)).isna()
    return (len(t_s)-t_s.sum())/len(t_s)

# Iterate over quantities to calculate log10(quantity), the corresponding probability
# of detection, and its associated probit score
for idx, val in enumerate(qtys):
    log_qtys[idx] = math.log10(val)
    prop[idx] = hitrate(val, data)
    probit[idx] = 5 + norm.ppf(prop[idx])

# Create a dataframe (with headers) composed of the quantaties and their associated
# probabilities and probit scores, then drop rows with probability of 0 or 1
hitTable = pd.DataFrame(np.vstack([qtys,log_qtys,prop,probit]).T, columns=['qty','log_qty','probability','probit'])
hitTable.probit.replace([np.inf,-np.inf],np.nan, inplace=True)
hitTable.dropna(inplace=True)

def regPlot(x, y, log):
    # Update parameters, set y95 to probit score corresponding to 95% CI
    params = {'mathtext.default': 'regular'}
    plt.rcParams.update(params)
    y95 = 6.6448536269514722

    # Define lambda function for a line, run regression, and find the coefficient of determination
    regFun = lambda m, x, b : (m*x) + b
    regression = scipy.stats.linregress(x,y)
    r_2 = regression.rvalue*regression.rvalue

    # Solve y=mx+b for x at 95% CI
    log_llod = (y95 - regression.intercept) / regression.slope
    xmax = log_llod * 1.2

    # Start plotting all the things!
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.set_ylabel('Probit score\n$(\sigma + 5)$')

    if log == 'y':
        ax.set_xlabel('$log_{10}$(input quantity)')

    elif log == 'n':
        ax.set_xlabel('input quantity')

    else:
        raise ValueError('Error when calling regPlot(x,y,log) - User input invalid.')

    x_r = [0, xmax]
    y_r = [regression.intercept, regFun(regression.slope,x_r[1],regression.intercept)]
    ax.plot(x_r, y_r, '--k') # linear regression
    ax.plot(log_llod, y95, color='red', marker='o', markersize=8) # LLOD point
    ax.plot([0,xmax], [y95,y95], color='red', linestyle=':') # horiz. red line
    ax.plot([log_llod,log_llod], [regFun(regression.slope,x_r[0],regression.intercept),7.1], color='red', linestyle=':') # vert. red line
    ax.plot(x, y, 'bx') # actual (qty, probit) data points
    ax.grid() # grid
    plt.show()
    print('\n Linear regression using least-squares method yields:\n')
    print('\t\ty = '+str("%.3f"%regression.slope)+'x + '+str("%.3f"%regression.intercept)+'\n')
    print('\twith a corresponding R-squared value of', str("%.5f"%r_2)+"\n")

    return regression.slope, regression.intercept, r_2, regression.stderr, regression.intercept_stderr, log_llod

print('\n', hitTable, '\n')

if logans == 'y':
    m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.log_qty, hitTable.probit, logans)
    llod_95 = 10**log_llod
    if r_2 < 0.9:
        print('WARNING: low r-squared value for linear regression. Try re-analyzing without using log10.')

elif logans == 'n':
    m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.qty, hitTable.probit, logans)
    llod_95 = log_llod
    if r_2 < 0.9:
        print('WARNING: low r-squared value for linear regression. Try re-analyzing using log10.')

else:
    raise ValueError('Error when attempting to evaluate llod_95 - User input invalid.')

print("\nThe lower limit of detection (LLoD) at 95% CI is " + str("%.4f"%llod_95) + ".\n")

This seems like it's because of i/o blocking when plt.show() is called.这似乎是因为调用plt.show()时的 i/o 阻塞。 It displays the graph in a window and waits for you to close it before continuing with the code execution.它在 window 中显示图形,并在继续执行代码之前等待您将其关闭。

This is by default in matplotlib but you can make it non-blocking: https://stackoverflow.com/a/33050617/15332448这是matplotlib中的默认设置,但您可以将其设为非阻塞: https://stackoverflow.com/a/33050617/15332448

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM