在 python 中的索引紧随特定字符串之后开始

Question

I have a tab separated file with values as follows:我有一个制表符分隔的文件，其值如下：

12  6814296 2   192 C:0.911458  T:0.0885417
12  6814328 2   192 C:1 T:0
12  6814345 2   192 C:1 T:0
12  6814360 2   192 C:1 T:0
12  6814381 2   192 G:1 A:0
12  6814396 2   192 C:1 A:0
12  6814397 2   192 G:0.989583  A:0.0104167
12  6814464 2   192 T:1 C:0
12  6814468 2   192 C:0.927083  TCCC:0.0729167
12  6814486 2   192 C:1 T:0
12  6814551 2   192 G:1 C:0
12  6814567 2   192 A:1 G:0
12  6814589 2   192 C:0.989583  T:0.0104167
12  6814619 2   192 G:1 A:0
12  6814663 2   192 A:1 G:0
12  6814732 2   192 C:1 T:0
12  6814752 4   192 CTTT:0.979167   CTTTTT:0    CT:0.015625 C:0.00520833
12  6814786 2   192 C:1 <CN0>:0
12  6814798 2   192 C:0.984375  T:0.015625
12  6814828 2   192 C:0.989583  G:0.0104167
12  6814951 2   192 G:1 C:0

From this file, I have to create a csv file with 3 comma-separated values in each row.从这个文件中，我必须创建一个 csv 文件，每行有 3 个逗号分隔值。

Below is my code:下面是我的代码：

file1 = open('/home/aahm/Documents/gene1.frq', 'r')
input_data = file1.readlines()
for line in input_data:
    rm_newline = line.strip('\n')
    comma_separated = rm_newline.split('\t')
    a = comma_separated[0]
    b = comma_separated[1]
    c = comma_separated[-1]
    d = c[2:]
    if comma_separated [2] == '2':
        e = a + ','+ b +',' + d
        print (e)
    elif comma_separated [2] == '3':
        f = comma_separated[-1]
        g = f[2:]
        h = comma_separated[-2]
        i = h[2:]
        if g > i:
            j = a + ','+ b +',' + g
            print (j)
        else:
            k = a + ','+ b +',' + i
            print (k)
    elif comma_separated [2] == '4':
        l = comma_separated[-1]
        m = l[2:]
        n = comma_separated[-2]
        o = n[2:]
        p = comma_separated[-3]
        q = p[2:]
        if m > o and m > p:
            r = a + ','+ b +',' + m
            print (r)
            
        elif o > m and o > p:
            s = a + ','+ b +',' + o
            print (s)
            
        elif p > m and p > o:
            t =  a + ','+ b +',' + p
            print (t)

The code works well except that for indexing I have used these:该代码运行良好，除了索引我使用了这些：

d = c[2:]
g = f[2:]
i = h[2:]

etc.等等

For column 6 and 7 and 8 in the input file, I need only the numbers as output.对于输入文件中的第 6 列和第 7 列和第 8 列，我只需要 output 这样的数字。 However, my indexing gives me character strings as well as numbers for some rows as the character string preceding ':' is greater than 1. An example is given below但是，我的索引为我提供了字符串以及某些行的数字，因为 ':' 前面的字符串大于 1。下面给出了一个示例

The value in the last column is TCCC:0.0729167 for 1 row.最后一列的值为 TCCC:0.0729167 表示 1 行。 When indexing 'd = c[2:]' is used for indexing, I get CC:0.0729167as output, whereas I need only 0.0729167 as output.当索引'd = c [2：]'用于索引时，我得到CC：0.0729167as output，而我只需要0.0729167作为output。

I am stuck with this and do not have any hint at all about how to proceed.我坚持这一点，根本没有任何关于如何进行的提示。 I would be very grateful for any help.如果有任何帮助，我将不胜感激。 Thanks!谢谢！

Answer 1

You are slicing the list starting from third character (included) to the end, which gives you 'CC:0.0729167' in your example.您正在从第三个字符（包括）开始对列表进行切片，在您的示例中为您提供“CC：0.0729167”。 As other people said in the comments, you could just use yourstring.split(":")[1] in order to split the string based on the position of the colon, and then retrieve the second half of it by specifying its index with [1] .正如其他人在评论中所说，您可以使用yourstring.split(":")[1]根据冒号的 position 拆分字符串，然后通过指定其索引来检索它的后半部分[1] 。

Answer 2

As per the comments others have made, where you have a ":" remaining in the column data you need to split it out.根据其他人的评论，您需要将其拆分出来的列数据中剩余一个“：”。 However, the code you have here is already rather opaque - all the alphabet-letter variables makes it quite difficult to see what a simple piece of code is actually trying to do.但是，您在此处的代码已经相当不透明 - 所有字母字母变量使得很难看出一段简单的代码实际上试图做什么。 To avoid making it worse, in the example below I've defined a simple function getnum, which you feed a field and it do the split for you if needed.为了避免变得更糟，在下面的示例中，我定义了一个简单的 function getnum，您可以提供一个字段，如果需要，它会为您进行拆分。 Of course, this won't work if the field has more than one ":" character, but it would be easy enough to modify getnum.当然，如果字段有多个“：”字符，这将不起作用，但修改 getnum 很容易。 I've then altered you code to run every field through this getnum function.然后，我更改了您的代码以通过此 getnum function 运行每个字段。

To make life easier for yourself, I would encourage you to use more meaningful variable names than a, b, c and so on.为了让自己的生活更轻松，我鼓励您使用比 a、b、c 等更有意义的变量名。 Also, a little explanatory comment here and there would go a long way - I think with these in place you would probably have been able to crack the problem yourself!此外，这里有一点解释性评论，go 会有很长的路要走——我认为有了这些，你可能已经能够自己解决问题了！

input_data = file1.readlines()

# process a field to only use numbers after a :
def getnum(src):
    if ":" in src:
        return src.split(":")[1]
    else:
        return src

for line in input_data:
    rm_newline = line.strip('\n')
    comma_separated = rm_newline.split('\t')
    a = getnum(comma_separated[0])
    b = getnum(comma_separated[1])
    c = getnum(comma_separated[-1])
    d = c[2:]
    if comma_separated [2] == '2':
        e = a + ','+ b +',' + d
        print (e)
    elif comma_separated [2] == '3':
        f = getnum(comma_separated[-1])
        g = f[2:]
        h = getnum(comma_separated[-2])
        i = h[2:]
        if g > i:
            j = a + ','+ b +',' + g
            print (j)
        else:
            k = a + ','+ b +',' + i
            print (k)
    elif comma_separated [2] == '4':
        l = getnum(comma_separated[-1])
        m = l[2:]
        n = getnum(comma_separated[-2])
        o = n[2:]
        p = getnum(comma_separated[-3])
        q = p[2:]
        if m > o and m > p:
            r = a + ','+ b +',' + m
            print (r)
            
        elif o > m and o > p:
            s = a + ','+ b +',' + o
            print (s)
            
        elif p > m and p > o:
            t =  a + ','+ b +',' + p
            print (t)

在 python 中的索引紧随特定字符串之后开始

问题描述

2 个解决方案

解决方案1
1 2021-02-25 09:12:47

解决方案2
1 2021-02-25 09:58:11

在 python 中的索引紧随特定字符串之后开始

问题描述

2 个解决方案

解决方案1 1 2021-02-25 09:12:47

解决方案2 1 2021-02-25 09:58:11

解决方案1
1 2021-02-25 09:12:47

解决方案2
1 2021-02-25 09:58:11