简体   繁体   English

如何将数据框行转换为列?

[英]How to convert dataframe rows into columns?

I have a dataset/dataframe in this format: 我有以下格式的数据集/数据框:

gene : ABC
sample: XYX
input:23
.
.
.
gene : DEF
sample: ERT
input :24

.
.

it goes on and on. 它会一直持续下去。

How do I get it in this format? 如何以这种格式获取它?

gene sample input
abc   xyx   23
def    ert   24

.
.

Python or shell commands any will do. Python或Shell命令都可以。

I tried pd transpose but then it doesn't seem to give me a result I'm looking for, not getting the desired output. 我尝试了pd transpose,但随后似乎没有给我我想要的结果,没有得到所需的输出。

I'm not 100% sure what you're looking for. 我不确定您要寻找什么。 I'll give a couple examples of potential solutions. 我将给出一些潜在解决方案的示例。 If these don't match up what you're looking for, please update your question or add a comment. 如果这些与您的要求不符,请更新您的问题或添加评论。

Set up (following your example info): 设置(按照您的示例信息):

    import pandas as pd
    dict1 = {"gene": "ABC", "sample": "XYZ", "input": 23}
    dict2 = {"gene": "DEF", "sample": "ERT", "input": 24}
    columns = ["gene", "sample", "input"]
    df = pd.DataFrame([dict1, dict2], columns=columns)

The output of df looks like: df的输出如下所示:

  gene sample  input
0  ABC    XYZ     23
1  DEF    ERT     24

That looks like what you're looking for in your questions. 看起来就像您在问题中寻找的东西。 If that's true, you can use a similar set up (like the code block at the beginning) to set up this DataFrame. 如果是这样,则可以使用类似的设置(例如开头的代码块)来设置此DataFrame。

If you mean you have that format and you're looking to transpose it, I would recommend the following: 如果您想使用该格式,并且希望对其进行转置,则建议以下内容:

    # columns will be the index from 0 to n-1:
    df.transpose()
    # output:
    #           0    1
    # gene    ABC  DEF
    # sample  XYZ  ERT
    # input    23   24

    # try this instead
    list_that_contains_n_items_to_be_columns = ["a", "b"]
    df.index = pd.Index(list_that_contains_n_items_to_be_columns)
    df.transpose()
    # output:
    #           a    b
    # gene    ABC  DEF
    # sample  XYZ  ERT
    # input    23   24

If you meant you have the info you posted in a text file like: 如果您的意思是将信息发布在文本文件中,例如:

gene : ABC
sample: XYX
input:23
gene : DEF
sample: ERT
input :24

you would need to read it in and put it in a DataFrame (similar to csv format). 您需要将其读入并放入DataFrame中(类似于csv格式)。 You could do that by: 您可以通过以下方式做到这一点:

import pandas as pd
list_of_dicts = []
with open("data.txt") as f:
    number_columns = 3 # change this as necessary
    line_num = 0
    for line in f:
        if line_num % number_columns == 0:
            if line_num == 0:
                dict_row = {}
            else:
                list_of_dicts.append(dict_row)
                dict_row = {}
        line_num += 1
        (key, val) = line.split(":")
        dict_row[str(key)] = val.rstrip()

# add your columns to that list
df = pd.DataFrame(list_of_dicts, columns=["gene", "sample", "input"])
print(df)

This will read in your file, line by line and create a list of dictionaries, which is easy to turn into a pandas DataFrame. 这将逐行读取您的文件并创建字典列表,该列表很容易变成pandas DataFrame。 If you want an actual csv file, you can run df.to_csv("name_of_file.csv") . 如果要使用实际的csv文件,则可以运行df.to_csv("name_of_file.csv")

Hope one of these helps! 希望这些帮助之一!

EDIT: To look over all files in a directory, you can add the following code in front of the loop: 编辑:要查看目录中的所有文件,可以在循环前面添加以下代码:

    import glob
    for filename in glob.glob("/your/path/here/*.txt"):
        # code you want to execute

EDIT EDIT: 编辑编辑:

The question does not seem to relate to what is being asked (see the comments of this answer). 该问题似乎与所要询问的内容无关(请参阅此答案的评论)。 It seems the author has .tsv files that are already in DataFrame-esque format and they want the files read in as DataFrames. 似乎作者拥有已经是DataFrame风格的.tsv文件,他们希望这些文件作为DataFrames读入。 The sample file given is: 给出的示例文件是:

Sample Name:    1234
Index:  IB04
Input DNA:  100

Detected ITD Variants:
Size    READS   VRF



Sample Name:    1235
Index:  IB05
Input DNA:  100

Detected Variants:
Size    READS   VRF
27  112995  4.44e-01
Total   112995  4.44e-01

Example code to read this file in and create a "Sample" DF: 读取此文件并创建“样本” DF的示例代码:

#!/usr/bin/python
import os
import glob
import pandas as pd
os.chdir(os.getcwd())


def get_df(num_cols=3, start_key="Sample", switch_line=""):
    list_of_dfs = []
    for filepath in glob.glob("*.tsv"):
        list_of_dicts = []
        number_columns = num_cols
        line_num = 0
        part_of_df = False
        with open(filepath) as file:
            for line in file:
                # only read in lines to the df that are part of the dataframe
                if start_key in line:
                    part_of_df = True 
                elif line.strip() == "":
                    # if an empty line, go back to not adding it
                    part_of_df = False
                    continue
                if part_of_df:
                    # depending on the number of columns, add to the df
                    if line_num % number_columns == 0:
                        if line_num == 0:
                            dict_row = {}
                        else:
                            list_of_dicts.append(dict_row)
                            dict_row = {}
                    line_num += 1
                    (key, val) = line.split(":")
                    dict_row[str(key)] = val.rstrip().strip()
            if len(dict_row) % number_columns == 0:
                # if last added row is the last row of the file
                list_of_dicts.append(dict_row)
            df = pd.DataFrame(list_of_dicts, columns=['Sample Name','Index','Input DNA'])
        list_of_dfs.append(df)
    # concatenate all the files together
    final_df = pd.concat(list_of_dfs)
    return final_df

df_samples = get_df(num_cols=3, start_key="Sample", switch_line="")
print(df_samples)

This creates a DataFrame with the data for genes. 这将创建一个包含基因数据的DataFrame。 If this created the dataset you are looking for, please mark this answer as accepted. 如果这创建了您要查找的数据集,请将此答案标记为已接受。 Please ask a new question if you have further questions (posting a data file in the question is very helpful). 如果您还有其他问题,请提出一个新问题(在问题中发布数据文件非常有帮助)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM