简体   繁体   English

复印号码文件格式问题

[英]Copy number file format Issue

I have a problem with a .csv file from Copy number data. 我的副本编号数据中的.csv文件有问题。 The original looks like this: 原始的样子是这样的:

genes               Log2
PIK3CA,TET2          -0.35
MLH2,NRAS            0.54

And, what I need is: 而且,我需要的是:

genes                Log2

PIK3CA              -0.35
TET2                -0.35
MLH2                0.54
NRAS                0.54

I have tried many things by now, and they have not been successful. 到目前为止,我已经尝试了很多方法,但都没有成功。 The file was created with CNVkit from gastric cancer samples. 该文件是使用CNVkit从胃癌样本创建的。 The file is much bigger, and the list of genes is longer, but this is essentially what I need to do in order to analyze our cnv data. 该文件更大,基因列表更长,但这实际上是我需要分析cnv数据的工作。

I have tried this: 我已经试过了:

awk -F , -v OFS='\t' 'NR == 1 || $0 > 0 {print $4}' copynumber.csv | less

Which is the closest i've got. 我最近的那个。

I use Linux, Ubuntu 16.04. 我使用Linux,Ubuntu 16.04。 I would appreciate if you could help me with an R or Python script, but, by now, any solution would be good. 如果您可以通过R或Python脚本帮助我,将不胜感激,但是,到目前为止,任何解决方案都将是不错的选择。

We can use separate_rows from the tidyr package if you are using R. 我们可以使用separate_rowstidyr如果您正在使用R.包

library(tidyr)

dat2 <- dat %>% separate_rows(genes)
dat2
#    genes  Log2
# 1 PIK3CA -0.35
# 2   TET2 -0.35
# 3   MLH2  0.54
# 4   NRAS  0.54

DATA 数据

dat <- read.table(text = "genes               Log2
PIK3CA,TET2          -0.35
                  MLH2,NRAS            0.54",
                  header = TRUE, stringsAsFactors = FALSE)

It can be easily achieved with python. 使用python可以轻松实现。
You can split a line by a space first and then iterate over multiple comma-separated fields. 您可以先用空格分隔一行,然后遍历多个逗号分隔的字段。

filename = 'copynumber.csv'
with open(filename, 'r') as fp:
    header = fp.readline()
    print(header)
    for line in fp:
        keys, value = line.split()
        for key in keys.split(','):
            print(key + " " + value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM