简体   繁体   English

如何迭代 pandas 索引?

[英]How to iterate over pandas index?

I set the Hugo_Symbol column of my dataframe as index.我将 dataframe 的Hugo_Symbol列设置为索引。 In the result variable, I want to iterate over this index column.result变量中,我想遍历这个索引列。

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import csv

class DataProcessing:

    def __init__(self, data):
        self.df = pd.read_csv(data, sep="\t").drop("Entrez_Gene_Id", axis=1, errors="ignore")
        self.df = self.df.loc[:, ~self.df.columns.duplicated()]
        self.df = self.df.set_index("Hugo_Symbol")
        self.df = self.df.sort_index()

    def split_data(self):
        X = self.df.iloc[:, :-1]
        y = self.df.iloc[:, -1]
        return X, y

    def pca(self):
        pca = PCA()
        if np.any(np.isnan(self.df)):
            pass
        elif np.all(np.isfinite(self.df)):
            pass
        else:
            pca.fit(self.df.iloc[1:, 3:])
            self.pca_components = pca.components
            return self.pca_components

def main():

    
    cna = DataProcessing(directory + "data_linear_cna.txt")
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]

main()

Traceback:追溯:

Traceback (most recent call last):
  File "/home/main.py", line 87, in <module>
    main()
  File "/home/main.py", line 83, in main
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
  File "/home/main.py", line 83, in <listcomp>
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
TypeError: can only join an iterable

Example dataframe示例 dataframe

Hugo_Symbol雨果符号 TCGA-1 TCGA-1 TCGA-2 TCGA-2 TCGA-3 TCGA-3
First第一的 0.123 0.123 0.234 0.234 0.345 0.345
Second第二 0.123 0.123 0.234 0.234 0.478 0.478
Third第三 0.456 0.456 0.678 0.678 0.789 0.789
Fourth第四 0.789 0.789 0.456 0.456 0.321 0.321

As pointed out in the comments, it is difficult to answer the question because of lacking of a minimal reproducible example.正如评论中指出的那样,由于缺乏最小的可重现示例,很难回答这个问题。 I think your issue is with understanding what is returned from iterrows or to_dict .我认为您的问题在于了解从iterrowsto_dict返回的内容。 But maybe this could help you.但也许这可以帮助你。

Using this minimal dataframe for simplicity:为简单起见,使用这个最小值 dataframe:

df = pd.DataFrame({"a": list("asdf"), "b": [1, 2, 3, 4]})
Out[11]: 
   a  b
0  a  1
1  s  2
2  d  3
3  f  4

When you convert it to "records" you loose information about the index.当您将其转换为“记录”时,您会丢失有关索引的信息。 If the index is simply a monotonic one starting at 0, then using enumerate could help you as:如果索引只是一个从 0 开始的单调索引,那么使用enumerate可以帮助您:

[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in enumerate(df.astype(str).to_dict(orient='records'))]

index: 0, vals: a, 1
index: 1, vals: s, 2
index: 2, vals: d, 3
index: 3, vals: f, 4

If the original index is important, you can use zip as (modifying the index, such that you can see the difference):如果原始索引很重要,您可以使用zip作为(修改索引,以便您可以看到差异):

df.index = [-10, 3, 15, 33]

[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in zip(df.index, df.astype(str).to_dict(orient='records'))]

index: -10, vals: a, 1
index: 3, vals: s, 2
index: 15, vals: d, 3
index: 33, vals: f, 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM