简体   繁体   English

TypeError:遍历每一行以获得特定列值时,字符串索引必须是整数

[英]TypeError: string indices must be integers when iterrating over each row to get a specific column value

I want to perform linear regression analysis on time for each gene taking all the variables present in the model, hence using all the genes.我想对每个基因进行时间线性回归分析,采用 model 中存在的所有变量,因此使用所有基因。

In df5, the x-axis represents "Gene Symbol" and y-axis represent "Time".在df5中,x轴代表“基因符号”,y轴代表“时间”。

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Split data into training and test splits
train_idx, test_idx = train_test_split(df5.index, test_size=.25, random_state=0)
df5["split"] = "train"
df5.loc[test_idx, "split"] = "test"

# Inputs and targets
X = df5.iloc[:, 1:-1]
y = df5.iloc[:, 0]

X_train = df5.loc[train_idx, ["4", "8", "12", "24", "48"]]
y_train = df5.loc[train_idx, "0"]

# Linear regression prediction
model = LinearRegression()
model.fit(X_train, y_train)
df5['prediction'] = model.predict(X)

I get a typeerror when I want to set y variable as the prediction column value for each row using y=i["prediction"] .当我想使用y=i["prediction"]y变量设置为每一行的prediction列值时,我得到一个类型错误。

# Scatter plot
for i, j in df5.iterrows():
  for col in df5.columns:
    fig = px.scatter(df5[col], x=df5.iloc[:,0], y=i["prediction"], marginal_x='histogram', marginal_y='histogram', color='split', trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Traceback:追溯:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-395-6ef08290c83a> in <module>()
      2 for i, j in df5.iterrows():
      3   for col in df5.columns:
----> 4     fig = px.scatter(df5[col], x=df5.iloc[:,0], y=i["prediction"], marginal_x='histogram', marginal_y='histogram', color='split', trendline='ols')
      5     fig.update_traces(histnorm='probability', selector={'type':'histogram'})
      6     fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())

TypeError: string indices must be integers

Data:数据:

df5.head().to_dict()

{'0': {'DNAJB6 /// TMEM135': 0.30131649339447103,
  'DNAJC14': 0.2255444383216058,
  'DNAJC15': 0.25789169794229455,
  'DNAJC30': 0.11388797858763917,
  'DNAJC9': 0.11205541676885071},
 '12': {'DNAJB6 /// TMEM135': 0.28354614480145346,
  'DNAJC14': 0.2343653660720247,
  'DNAJC15': 0.2406210529534205,
  'DNAJC30': 0.11229754447748205,
  'DNAJC9': 0.12045170255898871},
 '24': {'DNAJB6 /// TMEM135': 0.27395808285292367,
  'DNAJC14': 0.2246018336027369,
  'DNAJC15': 0.22347959865906092,
  'DNAJC30': 0.11379897713291527,
  'DNAJC9': 0.10622530623273815},
 '4': {'DNAJB6 /// TMEM135': 0.2949284643966144,
  'DNAJC14': 0.22905481299223704,
  'DNAJC15': 0.22312009403152122,
  'DNAJC30': 0.13114878202076288,
  'DNAJC9': 0.12991396178392187},
 '48': {'DNAJB6 /// TMEM135': 0.289873135093664,
  'DNAJC14': 0.2349502215468218,
  'DNAJC15': 0.17706771640592167,
  'DNAJC30': 0.10857074282633467,
  'DNAJC9': 0.13001391250069522},
 '8': {'DNAJB6 /// TMEM135': 0.2794865791356734,
  'DNAJC14': 0.22228815371920396,
  'DNAJC15': 0.22912018863353348,
  'DNAJC30': 0.11799998627920205,
  'DNAJC9': 0.10520854728987451}}

First: If error shows you which line makes problem then first you could use print(), print(type(...)), etc to check what you have in variables in this line.第一:如果错误告诉你哪一行出了问题,那么首先你可以使用 print()、print(type(...)) 等来检查你在这一行的变量中有什么。

It seems you use wrong variable.看来您使用了错误的变量。 I think wrong is i["prediction"] because i should be index of row , not row with data .我认为错误是i["prediction"]因为i应该是index of row ,而不是row with data Maybe if you would use more readable variables for index, row in df5.iterrow() instead of for i,j in df.iterrow() then you would see that you run index["prediction"] instead of row["prediction"]也许如果您for index, row in df5.iterrow()使用更具可读性的变量,而不是for i,j in df.iterrow()那么您会看到运行index["prediction"]而不是row["prediction"]


But frankly I don't understand what you try to plot.但坦率地说,我不明白你对 plot 的尝试。

x=df5.iloc[:,0] should give all data in column, not in row, but y=row["prediction"] should give single value from one row. x=df5.iloc[:,0]应该在列中给出所有数据,而不是在行中,但是y=row["prediction"]应该给出一行中的单个值。 It makes no sense.这没有道理。 You should rather use y=df5["prediction"] and run it without df5.iterrows() - or even use only columns names instead of data px.scatter(df5, x=col, y="prediction", ...)您应该使用y=df5["prediction"]并在没有df5.iterrows()的情况下运行它 - 甚至只使用列名而不是数据px.scatter(df5, x=col, y="prediction", ...)

for col in ["4", "8", "12", "24", "48"]:  # without "0"
    fig = px.scatter(df5, x=col, y="prediction", marginal_x='histogram', marginal_y='histogram', color='split')#, trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Full working code with example data in code - so everyone can simply copy and run it完整的工作代码和代码中的示例数据 - 所以每个人都可以简单地复制和运行它

BTW: it opens every plot on separated page.顺便说一句:它在单独的页面上打开每个 plot。 And I had to skip trendline='ols' in scatter because it gives me error ImportError: cannot import name '_centered' from 'scipy.signal.signaltools' (/usr/local/lib/python3.8/dist-packages/scipy/signal/signaltools.py)我不得不在scatter中跳过trendline='ols' ,因为它给了我错误ImportError: cannot import name '_centered' from 'scipy.signal.signaltools' (/usr/local/lib/python3.8/dist-packages/scipy/signal/signaltools.py)

data = {'0': {'DNAJB6 /// TMEM135': 0.30131649339447103,
  'DNAJC14': 0.2255444383216058,
  'DNAJC15': 0.25789169794229455,
  'DNAJC30': 0.11388797858763917,
  'DNAJC9': 0.11205541676885071},
 '12': {'DNAJB6 /// TMEM135': 0.28354614480145346,
  'DNAJC14': 0.2343653660720247,
  'DNAJC15': 0.2406210529534205,
  'DNAJC30': 0.11229754447748205,
  'DNAJC9': 0.12045170255898871},
 '24': {'DNAJB6 /// TMEM135': 0.27395808285292367,
  'DNAJC14': 0.2246018336027369,
  'DNAJC15': 0.22347959865906092,
  'DNAJC30': 0.11379897713291527,
  'DNAJC9': 0.10622530623273815},
 '4': {'DNAJB6 /// TMEM135': 0.2949284643966144,
  'DNAJC14': 0.22905481299223704,
  'DNAJC15': 0.22312009403152122,
  'DNAJC30': 0.13114878202076288,
  'DNAJC9': 0.12991396178392187},
 '48': {'DNAJB6 /// TMEM135': 0.289873135093664,
  'DNAJC14': 0.2349502215468218,
  'DNAJC15': 0.17706771640592167,
  'DNAJC30': 0.10857074282633467,
  'DNAJC9': 0.13001391250069522},
 '8': {'DNAJB6 /// TMEM135': 0.2794865791356734,
  'DNAJC14': 0.22228815371920396,
  'DNAJC15': 0.22912018863353348,
  'DNAJC30': 0.11799998627920205,
  'DNAJC9': 0.10520854728987451}
}

import pandas as pd
import numpy as np
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

df5 = pd.DataFrame(data)

# Split data into training and test splits
train_idx, test_idx = train_test_split(df5.index, test_size=.25, random_state=0)
df5["split"] = "train"
df5.loc[test_idx, "split"] = "test"

# Inputs and targets
X = df5.iloc[:, 1:-1]
y = df5.iloc[:, 0]

X_train = df5.loc[train_idx, ["4", "8", "12", "24", "48"]]
y_train = df5.loc[train_idx, "0"]

# Linear regression prediction
model = LinearRegression()
model.fit(X_train, y_train)
df5['prediction'] = model.predict(X)

for col in ["4", "8", "12", "24", "48"]:  # without "0"
    fig = px.scatter(df5, x=col, y="prediction", marginal_x='histogram', marginal_y='histogram', color='split')#, trendline='ols')
    fig.update_traces(histnorm='probability', selector={'type':'histogram'})
    fig.add_shape(type="line", line=dict(dash='dash'), x0=y.min(), y0=y.min(), x1=y.max(), y1=y.max())
    fig.show()

Plot for column "4" Plot 用于"4"

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TypeError:字符串索引必须是整数,而不是str //尝试获取key的值 - TypeError: string indices must be integers, not str // Trying to get value of key TypeError:字符串索引必须是整数 - TypeError : string indices must be integers 类型错误:字符串索引必须是整数 - TypeError:string indices must be integers / string索引处的TypeError必须是整数 - TypeError at / string indices must be integers TypeError:字符串索引必须是整数 - TypeError: string indices must be integers 对于dict中的in循环给出TypeError:字符串索引必须为整数 - For in loop over dict gives TypeError: string indices must be integers Python for循环json数据仅在单个数据元素时抛出&#39;TypeError:字符串索引必须为整数&#39; - Python for loop over json data throws 'TypeError: string indices must be integers' only when a single element of data 遍历一段代码时收到“ TypeError:字符串索引必须为整数” - Receive “TypeError: string indices must be integers” when iterating over one piece of code 类型错误:使用带有字符串参数的 itemgetter 时,字符串索引必须是整数 - TypeError: string indices must be integers when using itemgetter with string argument TypeError:使用.apply更新数据框列时,字符串索引必须为整数 - TypeError: string indices must be integers when updating a dataframe column using .apply
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM