[英]How to iterate over pandas index?
I set the Hugo_Symbol
column of my dataframe as index.我将 dataframe 的
Hugo_Symbol
列设置为索引。 In the result
variable, I want to iterate over this index column.在
result
变量中,我想遍历这个索引列。
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import csv
class DataProcessing:
def __init__(self, data):
self.df = pd.read_csv(data, sep="\t").drop("Entrez_Gene_Id", axis=1, errors="ignore")
self.df = self.df.loc[:, ~self.df.columns.duplicated()]
self.df = self.df.set_index("Hugo_Symbol")
self.df = self.df.sort_index()
def split_data(self):
X = self.df.iloc[:, :-1]
y = self.df.iloc[:, -1]
return X, y
def pca(self):
pca = PCA()
if np.any(np.isnan(self.df)):
pass
elif np.all(np.isfinite(self.df)):
pass
else:
pca.fit(self.df.iloc[1:, 3:])
self.pca_components = pca.components
return self.pca_components
def main():
cna = DataProcessing(directory + "data_linear_cna.txt")
result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
main()
Traceback:追溯:
Traceback (most recent call last):
File "/home/main.py", line 87, in <module>
main()
File "/home/main.py", line 83, in main
result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
File "/home/main.py", line 83, in <listcomp>
result = [[analysis.identifiers(ids=",".join(d)) for d, (index, row) in enumerate(cna.df.iterrows())]]
TypeError: can only join an iterable
Example dataframe示例 dataframe
Hugo_Symbol![]() |
TCGA-1 ![]() |
TCGA-2 ![]() |
TCGA-3 ![]() |
---|---|---|---|
First![]() |
0.123 ![]() |
0.234 ![]() |
0.345 ![]() |
Second![]() |
0.123 ![]() |
0.234 ![]() |
0.478 ![]() |
Third![]() |
0.456 ![]() |
0.678 ![]() |
0.789 ![]() |
Fourth![]() |
0.789 ![]() |
0.456 ![]() |
0.321 ![]() |
As pointed out in the comments, it is difficult to answer the question because of lacking of a minimal reproducible example.正如评论中指出的那样,由于缺乏最小的可重现示例,很难回答这个问题。 I think your issue is with understanding what is returned from
iterrows
or to_dict
.我认为您的问题在于了解从
iterrows
或to_dict
返回的内容。 But maybe this could help you.但也许这可以帮助你。
Using this minimal dataframe for simplicity:为简单起见,使用这个最小值 dataframe:
df = pd.DataFrame({"a": list("asdf"), "b": [1, 2, 3, 4]})
Out[11]:
a b
0 a 1
1 s 2
2 d 3
3 f 4
When you convert it to "records" you loose information about the index.当您将其转换为“记录”时,您会丢失有关索引的信息。 If the index is simply a monotonic one starting at 0, then using
enumerate
could help you as:如果索引只是一个从 0 开始的单调索引,那么使用
enumerate
可以帮助您:
[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in enumerate(df.astype(str).to_dict(orient='records'))]
index: 0, vals: a, 1
index: 1, vals: s, 2
index: 2, vals: d, 3
index: 3, vals: f, 4
If the original index is important, you can use zip
as (modifying the index, such that you can see the difference):如果原始索引很重要,您可以使用
zip
作为(修改索引,以便您可以看到差异):
df.index = [-10, 3, 15, 33]
[print(f'index: {i}, vals: {", ".join(list(d.values()))}') for i, d in zip(df.index, df.astype(str).to_dict(orient='records'))]
index: -10, vals: a, 1
index: 3, vals: s, 2
index: 15, vals: d, 3
index: 33, vals: f, 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.