简体   繁体   English

Python pandas read_csv 每两列合并,读取为 dataframe

[英]Python pandas read_csv merge every two columns and read them as a dataframe

Beginner in python and pandas and trying to figure out how to read from csv in a particular way. python 和 pandas 的初学者,并试图弄清楚如何以特定方式从 csv 中读取数据。

My datafile我的数据文件

01 AAA1234 AAA32452 AAA123123 0 -9 C C A A T G A G .......
01 AAA1334 AAA12452 AAA125123 1 -9 C A T G T G T G .......
...
...
...

So I have 100.000 columns in this file and I want to merge every two columns into one.所以我在这个文件中有 100.000 列,我想将每两列合并为一列。 But the merging needs to occur after the first 6 columns.但是合并需要在前 6 列之后进行。 I would prefer to do this while reading the file if possible instead of manipulating this huge datafile/如果可能的话,我宁愿在读取文件时这样做,而不是操作这个巨大的数据文件/

Desired outcome期望的结果

01 AAA1234 AAA32452 AAA123123 0 -9 CC AA TG AG .......
01 AAA1334 AAA12452 AAA125123 1 -9 CA TG TG TG .......
...
...
...

That will result in a dataframe with half the columns.这将导致 dataframe 有一半的列。 My datafile has no col names, the names reside in a different csv but that is another subject.我的数据文件没有列名,这些名称位于不同的 csv 中,但这是另一个主题。

I d appreciate a solution, thanks in advance!我很感激一个解决方案,在此先感谢!

Separate the data frame initially.最初分离数据框。 I created one for experimental purposes:我创建了一个用于实验目的:

在此处输入图像描述

Then I defined a function.然后我定义了一个 function。 Then passed in the dataframe which needed manipulation as an argument into the function然后将需要操作的 dataframe 作为参数传入 function

def columns_joiner(data):
    new_data = pd.DataFrame()
    for i in range(0,11,2): # You can change range to your wish
    # Here, I had only 10 columns to concatenate (Therefore the range ends at 11)
        ser = data[i] + data[i + 1]
        new_data = pd.concat([new_data, ser], axis = 1)

  return new_data

I don't think this is an efficient solution.我认为这不是一个有效的解决方案。 But it worked for me.但它对我有用。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM