简体   繁体   English

如何在python中将一列拆分为两列?

[英]How to split one column into two columns in python?

I have a contig file loaded in pandas like this:我在熊猫中加载了一个 contig 文件,如下所示:

    >NODE_1_length_4014_cov_1.97676
1       AATTAATGAAATAAAGCAAGAAGACAAGGTTAGACAAAAAAAAGAG...
2       CAAAGCCTCCAAGAAATATGGGACTATGTGAAAAGACCAAATCTAC...
3       CCTGAAAGTGACGGGGAGAATGGAACCAAGTTGGAAAACACTCTGC...
4       GAGAACTTCCCCAATCTAGCAAGGCAGGCCAACATTCAAATTCAGG...
5       CCACAAAGATACTCCTCGAGAAGAGCAACTCCAAGACACATAATTG...
6       GTTGAAATGAAGGAAAAAATGTTAAGGGCAGCCAGAGAGAAAGGTC...
7       GGGAAGCCCATCAGACTAACAGCGGATCTCTCGGCAGAAACCCTAC...
8       TGGGGGCCAATATTCAACATTCTTAAAGAAAAGAATTTTCAACCCA...
9       GCCAAACTAAGCTTCATAAGCAAAGGAGAAATAAAATCCTTTACAG...
10      AGAGATTTTGTCACCACCAGGCCTGCCTTACAAGAGCTCCTGAAGG...
11      GAAAGGAAAAACCGGTACCAGCCACTGCAAAATCATGCCAAACTGT...
12      CTAGGAAGAAACTGCATCAACTAATGAGCAAAATAACCAGCTAACA...
13      TCAAATTCACACATAACAATATTAACCTTAAATGTAAATGGGCTAA...
14      AGACACAGACTGGCAAATTGGATAAAGAGTCAAGACCCATCAGTGT...
15      ACCCATCTCAAATGCAGAGACACACATAGGCTCAAAATAAAGGGAT...
16      CAAGCAAATGGAAAACAAAAAAAGGCAGGGGTTGCAATCCTAGTCT...
17      TTTAAACCAACAAAGATCAAAAGAGACAAAGAAGGCCATTACATAA...
18      ATTCAACAAGAAGAGCTAACTATCCTAAATATATATGCACCCAATA...
19      TTCATAAAGCAAGTCCTCAGTGACCTACAAAGAGACTTAGACTCCC...
20      GGAGACTTTAACACCCCACTGTCAACATTAGACAGATCAACGAGAC...
21      GATATCCAGGAATTGAACTCAGCTCTGCACCAAGCGGACCTAATAG...
22      CTCCACCCCAAATCAACAGAATATACATTCTTTTCAGCACCACACC...
23      ATTGACCACATAGTTGGAAGTAAAGCTCTCCTCAGCAAATGTAAAA...
24      ACAAACTGTCTCTCAGACCACAGTGCAATCAAATTAGAACTCAGGA...
25      CAAAACTGCTCAACTACATGAAAACTGAACAACCTGCTCCTGAATG...
26      AACAAAATGAAGGCAGAAATAAAGATGTTCTTTGAAACCAATGAGA...
27      TACCAGAATCTCTGGGACGCATTCAAAGCAGTGTGTAGAGGGAAAT...
28      GCCCACAAGAGAAAGCAGGAAAGATCTAAAATTGACACCCTAACAT...
29      CTAGAGAAGCAAGAGCAAACACATTCAAAAGCTAGCAGAAGGCAAG...
                              ...                        
8540                         >NODE_2518_length_56_cov_219
8541    CCCTTGTTGGTGTTACAAAGCCCTTGAACTACATCAGCAAAGACAA...
8542                         >NODE_2519_length_56_cov_174
8543    CCGACTACTATCGAATTCCGCTCGACTACTATCGAATTCCGCTCGA...
8544                         >NODE_2520_length_56_cov_131
8545    CCCAGGAGACTTGTCTTTGCTGATGTAGTTCAAGAGCTTTGTAACA...
8546                         >NODE_2521_length_56_cov_118
8547    GGCTCCCTATCGGCTCGAATTCCGCTCGACTATTATCGAATTCCGC...
8548                          >NODE_2522_length_56_cov_96
8549    CCCGCCCCCAGGAGACTTGTCTTTGCTGATAGTAGTCGAGCGGAAT...
8550                          >NODE_2523_length_56_cov_74
8551    AGAGACTTGTCTTTGCTGATGTAGTTCAAGGGCTTTGTAACACCGA...
8552                          >NODE_2524_length_56_cov_70
8553    TGCTCGACTACTATCGAATTCCGCTCGACTACTATCGAATTCCGCT...
8554                          >NODE_2525_length_56_cov_59
8555    GAGACCCTTGTCGGTGTTACAAAGCCCTTTAACTACATCAGCAAAG...
8556                          >NODE_2526_length_56_cov_48
8557    CCGACTACTATCGAATTCCGCTCGACTACTATCGAATTCCGCTCGA...
8558                          >NODE_2527_length_56_cov_44
8559    CCAAGGGCTTTGTAACACCGACAAGGGTCTCGAAAACATCGGCATT...
8560                          >NODE_2528_length_56_cov_42
8561    GAGACCCTTGTAGGTGTTACAAAGCCCTTGAACTACATCAGCAAAG...
8562                          >NODE_2529_length_56_cov_38
8563    GAGACCCTTGTCGGTGTCACAAAGCCCTTGAACTACATCAGCAAAG...
8564                          >NODE_2530_length_56_cov_29
8565    GAGGGCTTTGTAACACCGACAAGGGTCTCGAAAACATCGGCATTCT...
8566                          >NODE_2531_length_56_cov_26
8567    AGGTTCAAGGGCTTTGTAACACCGACAAGGGTCTCGAAAACATCGG...
8568                          >NODE_2532_length_56_cov_25
8569    GAGATGTGTATAAGAGACTTGTCTTTGCTGATGTAGTTCAAGGGCT...

How to split this one column into two columns, making >NODE_...... in one column and the corresponding sequence in another column?如何将这一列拆分为两列,使一列中的 >NODE_...... 和另一列中的相应序列? Another issue is the sequences are in multiple lines, how to make them into one string?另一个问题是序列在多行中,如何将它们变成一个字符串? The result is expected like this:结果是这样的:

    contig                                  sequence
    NODE_1_length_4014_cov_1.97676         AAAAAAAAAAAAAAA
    NODE_........                          TTTTTTTTTTTTTTT

Thank you very much.非常感谢。

I can't reproduce your example, but my guess is that you are loading file with pandas that is not formatted in a tabular format.我无法重现您的示例,但我的猜测是您正在加载带有未格式化为表格格式的 Pandas 的文件。 From your example it looks like your file is formatted:从您的示例看来,您的文件已格式化:

>Identifier
    sequence
>Identifier
    sequence

You would have to parse the file before you can put the information into a pandas dataframe.您必须先解析文件,然后才能将信息放入 Pandas 数据框中。 The logic would be to loop through each line of your file, if the line starts with '>Node' you store the line as an identifier.逻辑是遍历文件的每一行,如果该行以 '>Node' 开头,则将该行存储为标识符。 If not you concatenate them to the sequence value.如果不是,则将它们连接到序列值。 Something like this:像这样的东西:

testfile = '>NODE_1_length_4014_cov_1.97676\nAAAAAAAATTTTTTCCCCCCCGGGGGG\n>NODE_2518_length_56_cov_219\nAAAAAAAAGCCCTTTTT'.split('\n')
identifiers = []
sequences = []
current_sequence = ''
for line in testfile:
     if line.startswith('>'):
         identifiers.append(line)
         sequences.append(current_sequence)
         current_sequence = ''
     else:
         current_sequence += line.strip('\n')

df = pd.DataFrame({'identifiers' = identifiers, 
                   'sequences' = sequences})

Whether this code works depends on the details of your input which you didn't provide, but that might get you started.此代码是否有效取决于您未提供的输入的详细信息,但这可能会让您开始。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 python pandas 将一列拆分为两列 - Split one column into two columns with python pandas PYTHON DATAFRAME-将数字[0,0] DATAFRAME的一个列拆分为两个列 - PYTHON DATAFRAME - SPLIT ONE COLUMN of numbers [0,0] DATAFRAME into TWO COLUMNS 将具有周期性重复标题的单列拆分为两列(Python) - Split one single column with periodic repeated headers into two columns (Python) 如何将 dataframe 列拆分为两列并使用 Python 转换一个表达式中的值? - How to split a dataframe column into two columns and transform values in one expression using Python? 如何使用 pandas 或 numpy(python)将文本文件中的 integer 值从一列拆分为两列 - how to split an integer value from one column to two columns in text file using pandas or numpy (python) 如何使用python pandas将具有整数和字符串混合值的一列拆分为两个不同的列 - How to split one column with mixed values of integers and strings into two different columns using python pandas 如何将字符串从一列拆分为与列表匹配的两列? - How to split string from one column into two columns that match with the list? 如何将一列中的字典列表拆分为 pyspark dataframe 中的两列? - How to split list of dictionary in one column into two columns in pyspark dataframe? 如何将列中的结果拆分为 python 中的两个单独列? - How could I split the result in a column into two separate columns in python? 按值将pandas中的一列拆分为两列 - Split one column into two columns in pandas by the values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM