简体   繁体   English

从.txt文件中读取列

[英]Reading in columns from .txt files

I have two datasets containing 4 .txt files which have 9 columns each. 我有两个包含4个.txt文件的数据集,每个文件有9列。 I am tasked with making a very simple bar graph which will display the comparison between the 6th and 7th columns for each file in the two datasets. 我的任务是制作一个非常简单的条形图,该条形图将显示两个数据集中每个文件的第6列和第7列之间的比较。 I will of course accept cols 1,2 & 3 as default. 我当然会接受默认的1,2和3列。 I wrote some code in python but I am having difficulty in reading multiple files and then selecting columns. 我用python写了一些代码,但是在读取多个文件然后选择列时遇到困难。 My code so far is as follows: 到目前为止,我的代码如下:

    # This script will plot the comparison between the BodyMap Gencode & BodyMap RefSeq paired end data.

import matplotlib.pyplot as plt
import numpy as np


#Reading in the files

with open("Illumina_Heart_Gencode_Aligned_Novel_Junctions.txt") as f:
        data = f.read()

data = data.split('\n')

x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]

fig = plt.figure()

ax1 = fig.add_subplot(111)

ax1.set_title("BodyMap Gencode Vs. RefSeq")
ax1.set_xlabel("Novel & Splice Junctions")
ax1.set_ylabel("Something")

ax1.plot(x,y, c='r', label='the data')

leg = ax1.legend()

plt.show()

I would like to get some suggestions on how to move forward. 我想就如何前进提出一些建议。

Thank you for your time. 感谢您的时间。

This is what my data looks like. 这就是我的数据。

File 1: 文件1:

    chr1    1718493 1718764 2   2   0   12  0   24
chr1    8928117 8930883 2   2   0   56  0   24
chr1    8930943 8931949 2   2   0   48  0   25
chr1    9616316 9627341 1   1   0   12  0   24
chr1    10166642    10167279    1   1   0   31  1   24
chr1    10338187    10342379    1   1   0   11  0   23
chr1    12040542    12042030    1   1   0   61  0   25
chr1    12395885    12401839    1   1   0   33  0   24
chr1    13814327    13815190    1   1   0   16  0   23
chr1    13815294    13815911    1   1   0   17  0   21
chr1    15978391    15986331    1   1   0   12  0   22
chr1    20386186    20411313    1   1   0   11  0   22
chr1    20412721    20417060    1   1   0   50  0   25
chr1    22159100    22159367    2   2   0   62  0   19
chr1    22159386    22159760    2   2   0   15  0   19
chr1    22192303    22195377    2   2   0   18  0   25
chr1    22196157    22196705    2   2   0   20  0   25
chr1    22197366    22198678    2   2   0   12  0   23
chr1    22217188    22220081    2   2   0   12  0   23
chr1    29064851    29095440    1   1   0   15  0   17
chr1    29391671    29395244    1   1   0   14  0   23
chr1    31833678    31840239    2   2   0   191 1   25
chr1    31840300    31842231    2   2   0   20  0   23
chr1    31840342    31845788    2   2   0   18  0   23
chr1    32051087    32052310    1   1   0   11  0   25
chr1    33800961    33815197    2   2   0   14  0   21
chr1    36766686    36767156    1   1   0   45  0   24
chr1    46379552    46383010    1   1   0   22  0   20

File 2: 档案2:

    chr1    880181  880421  2   2   0   15  0   21
chr1    1718493 1718764 2   2   0   12  0   24
chr1    8568735 8585817 2   2   0   12  0   21
chr1    8617583 8684368 2   2   0   14  0   23
chr1    8928117 8930883 2   2   0   56  0   24
chr1    8930943 8931949 2   2   0   48  0   25
chr1    9616316 9627341 1   1   0   12  0   24
chr1    9982417 9991948 2   2   0   18  0   23
chr1    10002841    10003306    2   2   0   17  0   20
chr1    10002841    10003406    2   2   0   21  0   25
chr1    10166642    10167279    1   1   0   31  1   24
chr1    10167433    10177516    1   1   0   96  0   24
chr1    10338187    10339154    1   1   0   29  0   23
chr1    10338187    10342379    1   1   0   11  0   23

I want to compare the 6th and 7th columns in both files and there are multiple files like this in my data set. 我想比较两个文件中的第六和第七列,并且我的数据集中有多个这样的文件。

numpy.loadtxt can certainly spare a number of lines of code, but this is only if you have no missing values (eg incomplete rows). numpy.loadtxt当然可以保留多行代码,但这仅在没有缺失值(例如,不完整的行)的情况下进行。 If this is the case then the manual approach (as you did) is best. 如果是这种情况,那么最好采用手动方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM