简体   繁体   English

用管道分隔的平面文件插入python以供Pandas和Stats使用

[英]Piping a pipe-delimited flat file into python for use in Pandas and Stats

I have searched a lot, but haven't found an answer to this. 我已经搜索了很多,但是还没有找到答案。

I am trying to pipe in a flat file with data and put into something python read and that I can do analysis with (for instance, perform a t-test). 我试图将带有数据的平面文件放入管道,并放入python读取的内容中,并且可以进行分析(例如,执行t检验)。

First, I created a simple pipe delimited flat file: 首先,我创建了一个简单的管道分隔平面文件:

 1|2 3|4 4|5 1|6 2|7 3|8 8|9 

and saved it as "simpledata". 并将其保存为“ simpledata”。

Then I created a bash script in nano as 然后我在nano中创建了一个bash脚本

#!/usr/bin/env python

import sys
from scipy import stats 

A = sys.stdin.read()
print A
paired_sample = stats.ttest_rel(A[:,0],A[:,1])
print "The t-statistic is %.3f and the p-value is %.3f." % paired_sample

Then I save the script as pairedttest.sh and run it as 然后我将脚本另存为pairedttest.sh并以

 cat simpledata | pairedttest.sh

The error I get is 我得到的错误是

TypeError: string indices must be integers, not tuple

Thanks for your help in advance 谢谢您的帮助

Are you trying to call this?: 您是要打电话吗?:

paired_sample = stats.ttest_rel([1,3,4,1,2,3,8], [2,4,5,6,7,8,9])

If so, you can't do it the way you're trying. 如果是这样,您将无法按照自己的方式进行操作。 A is just a string when you read it from stdin, so you can't index it the way you're trying. 当您从标准输入中读取A时,它只是一个字符串,因此您无法按照尝试的方式对其进行索引。 You need to build the two lists from the string. 您需要从字符串构建两个列表。 The most obvious way is like this: 最明显的方式是这样的:

left = []
right = []
for line in A.splitlines():
    l, r = line.split("|")
    left.append(int(l))
    right.append(int(r))
print left
print right

This will output: 这将输出:

[1, 3, 4, 1, 2, 3, 8]
[2, 4, 5, 6, 7, 8, 9]

So you can call stats.ttest_rel(left, right) 这样就可以调用stats.ttest_rel(left, right)

Or to be really clever and make a (nearly impossible to read) one-liner out of it: 或者说真的很聪明,使它变得(几乎无法阅读)单线:

z = zip(*[map(int, line.split("|")) for line in A.splitlines()])

This will output: 这将输出:

[(1, 3, 4, 1, 2, 3, 8), (2, 4, 5, 6, 7, 8, 9)]

So you can call stats.ttest_rel(*z) 因此您可以调用stats.ttest_rel(*z)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM