TypeError：遍历每一行以获得特定列值时，字符串索引必须是整数

Question

I want to perform linear regression analysis on time for each gene taking all the variables present in the model, hence using all the genes.我想对每个基因进行时间线性回归分析，采用 model 中存在的所有变量，因此使用所有基因。

In df5, the x-axis represents "Gene Symbol" and y-axis represent "Time".在df5中，x轴代表“基因符号”，y轴代表“时间”。

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Split data into training and test splits
train_idx, test_idx = train_test_split(df5.index, test_size=.25, random_state=0)
df5["split"] = "train"
df5.loc[test_idx, "split"] = "test"

# Inputs and targets
X = df5.iloc[:, 1:-1]
y = df5.iloc[:, 0]

X_train = df5.loc[train_idx, ["4", "8", "12", "24", "48"]]
y_train = df5.loc[train_idx, "0"]

# Linear regression prediction
model = LinearRegression()
model.fit(X_train, y_train)
df5['prediction'] = model.predict(X)

I get a typeerror when I want to set y variable as the prediction column value for each row using y=i["prediction"] .当我想使用y=i["prediction"]将y变量设置为每一行的prediction列值时，我得到一个类型错误。

# Scatter plot
for i, j in df5.iterrows():
  for col in df5.columns:
    fig = px.scatter(df5[col], x=df5.iloc[:,0], y=i["prediction"], marginal_x='histogram', marginal_y='histogram', color='split', trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Traceback:追溯：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-395-6ef08290c83a> in <module>()
      2 for i, j in df5.iterrows():
      3   for col in df5.columns:
----> 4     fig = px.scatter(df5[col], x=df5.iloc[:,0], y=i["prediction"], marginal_x='histogram', marginal_y='histogram', color='split', trendline='ols')
      5     fig.update_traces(histnorm='probability', selector={'type':'histogram'})
      6     fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())

TypeError: string indices must be integers

Data:数据：

df5.head().to_dict()

{'0': {'DNAJB6 /// TMEM135': 0.30131649339447103,
  'DNAJC14': 0.2255444383216058,
  'DNAJC15': 0.25789169794229455,
  'DNAJC30': 0.11388797858763917,
  'DNAJC9': 0.11205541676885071},
 '12': {'DNAJB6 /// TMEM135': 0.28354614480145346,
  'DNAJC14': 0.2343653660720247,
  'DNAJC15': 0.2406210529534205,
  'DNAJC30': 0.11229754447748205,
  'DNAJC9': 0.12045170255898871},
 '24': {'DNAJB6 /// TMEM135': 0.27395808285292367,
  'DNAJC14': 0.2246018336027369,
  'DNAJC15': 0.22347959865906092,
  'DNAJC30': 0.11379897713291527,
  'DNAJC9': 0.10622530623273815},
 '4': {'DNAJB6 /// TMEM135': 0.2949284643966144,
  'DNAJC14': 0.22905481299223704,
  'DNAJC15': 0.22312009403152122,
  'DNAJC30': 0.13114878202076288,
  'DNAJC9': 0.12991396178392187},
 '48': {'DNAJB6 /// TMEM135': 0.289873135093664,
  'DNAJC14': 0.2349502215468218,
  'DNAJC15': 0.17706771640592167,
  'DNAJC30': 0.10857074282633467,
  'DNAJC9': 0.13001391250069522},
 '8': {'DNAJB6 /// TMEM135': 0.2794865791356734,
  'DNAJC14': 0.22228815371920396,
  'DNAJC15': 0.22912018863353348,
  'DNAJC30': 0.11799998627920205,
  'DNAJC9': 0.10520854728987451}}

Answer 1

First: If error shows you which line makes problem then first you could use print(), print(type(...)), etc to check what you have in variables in this line.第一：如果错误告诉你哪一行出了问题，那么首先你可以使用 print()、print(type(...)) 等来检查你在这一行的变量中有什么。

It seems you use wrong variable.看来您使用了错误的变量。 I think wrong is i["prediction"] because i should be index of row , not row with data .我认为错误是i["prediction"]因为i应该是index of row ，而不是row with data 。 Maybe if you would use more readable variables for index, row in df5.iterrow() instead of for i,j in df.iterrow() then you would see that you run index["prediction"] instead of row["prediction"]也许如果您for index, row in df5.iterrow()使用更具可读性的变量，而不是for i,j in df.iterrow()那么您会看到运行index["prediction"]而不是row["prediction"]

But frankly I don't understand what you try to plot.但坦率地说，我不明白你对 plot 的尝试。

x=df5.iloc[:,0] should give all data in column, not in row, but y=row["prediction"] should give single value from one row. x=df5.iloc[:,0]应该在列中给出所有数据，而不是在行中，但是y=row["prediction"]应该给出一行中的单个值。 It makes no sense.这没有道理。 You should rather use y=df5["prediction"] and run it without df5.iterrows() - or even use only columns names instead of data px.scatter(df5, x=col, y="prediction", ...)您应该使用y=df5["prediction"]并在没有df5.iterrows()的情况下运行它 - 甚至只使用列名而不是数据px.scatter(df5, x=col, y="prediction", ...)

for col in ["4", "8", "12", "24", "48"]:  # without "0"
    fig = px.scatter(df5, x=col, y="prediction", marginal_x='histogram', marginal_y='histogram', color='split')#, trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Full working code with example data in code - so everyone can simply copy and run it完整的工作代码和代码中的示例数据 - 所以每个人都可以简单地复制和运行它

BTW: it opens every plot on separated page.顺便说一句：它在单独的页面上打开每个 plot。 And I had to skip trendline='ols' in scatter because it gives me error ImportError: cannot import name '_centered' from 'scipy.signal.signaltools' (/usr/local/lib/python3.8/dist-packages/scipy/signal/signaltools.py)我不得不在scatter中跳过trendline='ols' ，因为它给了我错误ImportError: cannot import name '_centered' from 'scipy.signal.signaltools' (/usr/local/lib/python3.8/dist-packages/scipy/signal/signaltools.py)

data = {'0': {'DNAJB6 /// TMEM135': 0.30131649339447103,
  'DNAJC14': 0.2255444383216058,
  'DNAJC15': 0.25789169794229455,
  'DNAJC30': 0.11388797858763917,
  'DNAJC9': 0.11205541676885071},
 '12': {'DNAJB6 /// TMEM135': 0.28354614480145346,
  'DNAJC14': 0.2343653660720247,
  'DNAJC15': 0.2406210529534205,
  'DNAJC30': 0.11229754447748205,
  'DNAJC9': 0.12045170255898871},
 '24': {'DNAJB6 /// TMEM135': 0.27395808285292367,
  'DNAJC14': 0.2246018336027369,
  'DNAJC15': 0.22347959865906092,
  'DNAJC30': 0.11379897713291527,
  'DNAJC9': 0.10622530623273815},
 '4': {'DNAJB6 /// TMEM135': 0.2949284643966144,
  'DNAJC14': 0.22905481299223704,
  'DNAJC15': 0.22312009403152122,
  'DNAJC30': 0.13114878202076288,
  'DNAJC9': 0.12991396178392187},
 '48': {'DNAJB6 /// TMEM135': 0.289873135093664,
  'DNAJC14': 0.2349502215468218,
  'DNAJC15': 0.17706771640592167,
  'DNAJC30': 0.10857074282633467,
  'DNAJC9': 0.13001391250069522},
 '8': {'DNAJB6 /// TMEM135': 0.2794865791356734,
  'DNAJC14': 0.22228815371920396,
  'DNAJC15': 0.22912018863353348,
  'DNAJC30': 0.11799998627920205,
  'DNAJC9': 0.10520854728987451}
}

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

df5 = pd.DataFrame(data)

# Split data into training and test splits
train_idx, test_idx = train_test_split(df5.index, test_size=.25, random_state=0)
df5["split"] = "train"
df5.loc[test_idx, "split"] = "test"

# Inputs and targets
X = df5.iloc[:, 1:-1]
y = df5.iloc[:, 0]

X_train = df5.loc[train_idx, ["4", "8", "12", "24", "48"]]
y_train = df5.loc[train_idx, "0"]

# Linear regression prediction
model = LinearRegression()
model.fit(X_train, y_train)
df5['prediction'] = model.predict(X)

for col in ["4", "8", "12", "24", "48"]:  # without "0"
    fig = px.scatter(df5, x=col, y="prediction", marginal_x='histogram', marginal_y='histogram', color='split')#, trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Plot for column "4" Plot 用于"4"列

TypeError：遍历每一行以获得特定列值时，字符串索引必须是整数

问题描述

1 个解决方案

解决方案1
1 2022-08-10 07:44:31

TypeError：遍历每一行以获得特定列值时，字符串索引必须是整数

问题描述

1 个解决方案

解决方案1 1 2022-08-10 07:44:31

解决方案1
1 2022-08-10 07:44:31