如何迭代 pandas 索引？

Question

I set the Hugo_Symbol column of my dataframe as index.我将 dataframe 的Hugo_Symbol列设置为索引。 In the result variable, I want to iterate over this index column.在result变量中，我想遍历这个索引列。

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import csv

class DataProcessing:

    def __init__(self, data):
        self.df = pd.read_csv(data, sep="\t").drop("Entrez_Gene_Id", axis=1, errors="ignore")
        self.df = self.df.loc[:, ~self.df.columns.duplicated()]
        self.df = self.df.set_index("Hugo_Symbol")
        self.df = self.df.sort_index()

    def split_data(self):
        X = self.df.iloc[:, :-1]
        y = self.df.iloc[:, -1]
        return X, y

    def pca(self):
        pca = PCA()
        if np.any(np.isnan(self.df)):
            pass
        elif np.all(np.isfinite(self.df)):
            pass
        else:
            pca.fit(self.df.iloc[1:, 3:])
            self.pca_components = pca.components
            return self.pca_components

def main():

    
    cna = DataProcessing(directory + "data_linear_cna.txt")
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]

main()

Traceback:追溯：

Traceback (most recent call last):
  File "/home/main.py", line 87, in <module>
    main()
  File "/home/main.py", line 83, in main
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
  File "/home/main.py", line 83, in <listcomp>
    result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
TypeError: can only join an iterable

Example dataframe示例 dataframe

Hugo_Symbol雨果符号	TCGA-1 TCGA-1	TCGA-2 TCGA-2	TCGA-3 TCGA-3
First第一的	0.123 0.123	0.234 0.234	0.345 0.345
Second第二	0.123 0.123	0.234 0.234	0.478 0.478
Third第三	0.456 0.456	0.678 0.678	0.789 0.789
Fourth第四	0.789 0.789	0.456 0.456	0.321 0.321

Answer 1

As pointed out in the comments, it is difficult to answer the question because of lacking of a minimal reproducible example.正如评论中指出的那样，由于缺乏最小的可重现示例，很难回答这个问题。 I think your issue is with understanding what is returned from iterrows or to_dict .我认为您的问题在于了解从iterrows或to_dict返回的内容。 But maybe this could help you.但也许这可以帮助你。

Using this minimal dataframe for simplicity:为简单起见，使用这个最小值 dataframe：

df = pd.DataFrame({"a": list("asdf"), "b": [1, 2, 3, 4]})
Out[11]: 
   a  b
0  a  1
1  s  2
2  d  3
3  f  4

When you convert it to "records" you loose information about the index.当您将其转换为“记录”时，您会丢失有关索引的信息。 If the index is simply a monotonic one starting at 0, then using enumerate could help you as:如果索引只是一个从 0 开始的单调索引，那么使用enumerate可以帮助您：

[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in enumerate(df.astype(str).to_dict(orient='records'))]

index: 0, vals: a, 1
index: 1, vals: s, 2
index: 2, vals: d, 3
index: 3, vals: f, 4

If the original index is important, you can use zip as (modifying the index, such that you can see the difference):如果原始索引很重要，您可以使用zip作为（修改索引，以便您可以看到差异）：

df.index = [-10, 3, 15, 33]

[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in zip(df.index, df.astype(str).to_dict(orient='records'))]

index: -10, vals: a, 1
index: 3, vals: s, 2
index: 15, vals: d, 3
index: 33, vals: f, 4

如何迭代 pandas 索引？

问题描述

1 个解决方案

解决方案1
0 2022-04-22 13:59:38

如何迭代 pandas 索引？

问题描述

1 个解决方案

解决方案1 0 2022-04-22 13:59:38

解决方案1
0 2022-04-22 13:59:38