[英]Iterating over columns and rows in pandas dataframe
I am trying to iterate through a dataframe that I have and use the values inside of the cells, but I need to use the names of the columns and rows that the cells come from.我试图遍历我拥有的数据框并使用单元格内的值,但我需要使用单元格来自的列和行的名称。 Because of that I am currently doing something like the following:
因此,我目前正在执行以下操作:
df=pandas.DataFrame(data={"C1" : [1,2,3,4,5], "C2":[1,2,3,4,5]},
index=["R1","R2","R3","R4","R5"])
for row in df.index.values:
for column in df.columns.values:
if (df[row][column] > 3:
if row in df2[column]:
print("data is present")
I need to use the row and column names because I am using them to look values up in another data frame that has related information.我需要使用行名和列名,因为我使用它们在另一个具有相关信息的数据框中查找值。 I know that for loops take forever in pandas, but I haven't been able to find any examples of how to iterate over both the row and the column and the same time.
我知道 for 循环在 Pandas 中是永远存在的,但是我找不到任何关于如何同时迭代行和列的示例。 This:
这个:
df.applymap()
wont work because it only gives the value in the cell, without keeping reference to which row and column the cell was in, and this:不会工作,因为它只给出单元格中的值,而不保留单元格所在的行和列的引用,并且这个:
df.apply(lambda row: row["column"])
wont work because I need get the name of the column without knowing it before.将无法工作,因为我需要在之前不知道的情况下获取列的名称。 Also this:
还有这个:
df.apply(lambda row: someFunction(row))
wont work because apply uses a Series object which only has the row name, rather than the row and column names.不会工作,因为 apply 使用只有行名而不是行名和列名的 Series 对象。
Any insight would be helpful!任何见解都会有所帮助! I am currently running the for loop version but it takes forever and also hogs CPU cores.
我目前正在运行 for 循环版本,但它需要永远并且占用 CPU 内核。
import pandas as pd
df = pd.DataFrame(data={"C1": [1, 2, 3, 4, 5],
"C2": [1, 2, 3, 4, 5]},
index=["R1", "R2", "R3", "R4", "R5"])
df2 = pd.DataFrame({'R3': [1], 'R5': [1], 'R6': [1]})
To get all of corresponding columns from df2 which have a value greater than 3 in df, you can use a conditional list comprehension:要从 df2 中获取在 df 中值大于 3 的所有相应列,您可以使用条件列表推导式:
>>> [idx for idx in df[df.gt(3).any(axis=1)].index if idx in df2]
['R5']
To see how this works:要查看这是如何工作的:
>>> df.gt(3)
C1 C2
R1 False False
R2 False False
R3 False False
R4 True True
R5 True True
Then we want the index of any row that has a value greater than three:然后我们想要任何值大于 3 的行的索引:
df.gt(3).any(axis=1)
Out[23]:
R1 False
R2 False
R3 False
R4 True
R5 True
dtype: bool
>>> df[df.gt(3).any(axis=1)]
C1 C2
R4 4 4
R5 5 5
>>> [i for i in df[df.gt(3).any(axis=1)].index]
['R4', 'R5']
>>> [i for i in df[df.gt(3).any(axis=1)].index if i in df2]
['R5']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.