[英]Reading in columns from .txt files
I have two datasets containing 4 .txt files which have 9 columns each. 我有两个包含4个.txt文件的数据集,每个文件有9列。 I am tasked with making a very simple bar graph which will display the comparison between the 6th and 7th columns for each file in the two datasets.
我的任务是制作一个非常简单的条形图,该条形图将显示两个数据集中每个文件的第6列和第7列之间的比较。 I will of course accept cols 1,2 & 3 as default.
我当然会接受默认的1,2和3列。 I wrote some code in python but I am having difficulty in reading multiple files and then selecting columns.
我用python写了一些代码,但是在读取多个文件然后选择列时遇到困难。 My code so far is as follows:
到目前为止,我的代码如下:
# This script will plot the comparison between the BodyMap Gencode & BodyMap RefSeq paired end data.
import matplotlib.pyplot as plt
import numpy as np
#Reading in the files
with open("Illumina_Heart_Gencode_Aligned_Novel_Junctions.txt") as f:
data = f.read()
data = data.split('\n')
x = [row.split(' ')[0] for row in data]
y = [row.split(' ')[1] for row in data]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_title("BodyMap Gencode Vs. RefSeq")
ax1.set_xlabel("Novel & Splice Junctions")
ax1.set_ylabel("Something")
ax1.plot(x,y, c='r', label='the data')
leg = ax1.legend()
plt.show()
I would like to get some suggestions on how to move forward. 我想就如何前进提出一些建议。
Thank you for your time. 感谢您的时间。
This is what my data looks like. 这就是我的数据。
File 1: 文件1:
chr1 1718493 1718764 2 2 0 12 0 24
chr1 8928117 8930883 2 2 0 56 0 24
chr1 8930943 8931949 2 2 0 48 0 25
chr1 9616316 9627341 1 1 0 12 0 24
chr1 10166642 10167279 1 1 0 31 1 24
chr1 10338187 10342379 1 1 0 11 0 23
chr1 12040542 12042030 1 1 0 61 0 25
chr1 12395885 12401839 1 1 0 33 0 24
chr1 13814327 13815190 1 1 0 16 0 23
chr1 13815294 13815911 1 1 0 17 0 21
chr1 15978391 15986331 1 1 0 12 0 22
chr1 20386186 20411313 1 1 0 11 0 22
chr1 20412721 20417060 1 1 0 50 0 25
chr1 22159100 22159367 2 2 0 62 0 19
chr1 22159386 22159760 2 2 0 15 0 19
chr1 22192303 22195377 2 2 0 18 0 25
chr1 22196157 22196705 2 2 0 20 0 25
chr1 22197366 22198678 2 2 0 12 0 23
chr1 22217188 22220081 2 2 0 12 0 23
chr1 29064851 29095440 1 1 0 15 0 17
chr1 29391671 29395244 1 1 0 14 0 23
chr1 31833678 31840239 2 2 0 191 1 25
chr1 31840300 31842231 2 2 0 20 0 23
chr1 31840342 31845788 2 2 0 18 0 23
chr1 32051087 32052310 1 1 0 11 0 25
chr1 33800961 33815197 2 2 0 14 0 21
chr1 36766686 36767156 1 1 0 45 0 24
chr1 46379552 46383010 1 1 0 22 0 20
File 2: 档案2:
chr1 880181 880421 2 2 0 15 0 21
chr1 1718493 1718764 2 2 0 12 0 24
chr1 8568735 8585817 2 2 0 12 0 21
chr1 8617583 8684368 2 2 0 14 0 23
chr1 8928117 8930883 2 2 0 56 0 24
chr1 8930943 8931949 2 2 0 48 0 25
chr1 9616316 9627341 1 1 0 12 0 24
chr1 9982417 9991948 2 2 0 18 0 23
chr1 10002841 10003306 2 2 0 17 0 20
chr1 10002841 10003406 2 2 0 21 0 25
chr1 10166642 10167279 1 1 0 31 1 24
chr1 10167433 10177516 1 1 0 96 0 24
chr1 10338187 10339154 1 1 0 29 0 23
chr1 10338187 10342379 1 1 0 11 0 23
I want to compare the 6th and 7th columns in both files and there are multiple files like this in my data set. 我想比较两个文件中的第六和第七列,并且我的数据集中有多个这样的文件。
numpy.loadtxt can certainly spare a number of lines of code, but this is only if you have no missing values (eg incomplete rows). numpy.loadtxt当然可以保留多行代码,但这仅在没有缺失值(例如,不完整的行)的情况下进行。 If this is the case then the manual approach (as you did) is best.
如果是这种情况,那么最好采用手动方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.