I have a problem with a .csv
file from Copy number data. The original looks like this:
genes Log2
PIK3CA,TET2 -0.35
MLH2,NRAS 0.54
And, what I need is:
genes Log2
PIK3CA -0.35
TET2 -0.35
MLH2 0.54
NRAS 0.54
I have tried many things by now, and they have not been successful. The file was created with CNVkit from gastric cancer samples. The file is much bigger, and the list of genes is longer, but this is essentially what I need to do in order to analyze our cnv data.
I have tried this:
awk -F , -v OFS='\t' 'NR == 1 || $0 > 0 {print $4}' copynumber.csv | less
Which is the closest i've got.
I use Linux, Ubuntu 16.04. I would appreciate if you could help me with an R or Python script, but, by now, any solution would be good.
We can use separate_rows
from the tidyr
package if you are using R.
library(tidyr)
dat2 <- dat %>% separate_rows(genes)
dat2
# genes Log2
# 1 PIK3CA -0.35
# 2 TET2 -0.35
# 3 MLH2 0.54
# 4 NRAS 0.54
DATA
dat <- read.table(text = "genes Log2
PIK3CA,TET2 -0.35
MLH2,NRAS 0.54",
header = TRUE, stringsAsFactors = FALSE)
It can be easily achieved with python.
You can split a line by a space first and then iterate over multiple comma-separated fields.
filename = 'copynumber.csv'
with open(filename, 'r') as fp:
header = fp.readline()
print(header)
for line in fp:
keys, value = line.split()
for key in keys.split(','):
print(key + " " + value)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.