错误-字符串索引必须是整数；使用pandas列作为字典的值

Question

I am practicing coding on dataquest.io, where they provide this code as a way to do random sampling. 我正在dataquest.io上练习编码，他们在其中提供此代码作为进行随机抽样的一种方式。

wnba['Pts_per_game'] = wnba['PTS'] / wnba['Games Played']

# Stratifying the data in five strata
stratum_G = wnba[wnba.Pos == 'G']
stratum_F = wnba[wnba.Pos == 'F']
stratum_C = wnba[wnba.Pos == 'C']
stratum_GF = wnba[wnba.Pos == 'G/F']
stratum_FC = wnba[wnba.Pos == 'F/C']

points_per_position = {}
for stratum, position in [(stratum_G, 'G'), (stratum_F, 'F'), (stratum_C, 'C'),
                (stratum_GF, 'G/F'), (stratum_FC, 'F/C')]:

    sample = stratum['Pts_per_game'].sample(10, random_state = 0) # simple random sapling on each stratum
    points_per_position[position] = sample.mean()

position_most_points = max(points_per_position, key = points_per_position.get)

I have tried to modify it by grouping the strata into a dictionary, in the following way. 我尝试通过以下方式将阶层分组为字典来对其进行修改。

wnba['Pts_per_game'] = wnba['PTS']/wnba['Games Played']

strata = {'stratum_F': wnba[wnba.Pos == 'F'],
'stratum_G': wnba[wnba.Pos == 'G'] , 
'stratum_C': wnba[wnba.Pos == 'C'] ,
'stratum_GF': wnba[wnba.Pos == 'G/F'] ,
'stratum_FC': wnba[wnba.Pos == 'F/C'] }

points_per_position = {}
for stratum, position in strata.items():
    sample = stratum['Pts_per_game'].sample(10,random_state=0)
    points_per_position[position]=sample.mean()

position_most_points=max(points_per_position,key= points_per_position.get)

However, I get TypeError: string indices must be integers . 但是，我得到TypeError: string indices must be integers 。 I have tried to work around the stratum['Pts_per_game'] part but could not find the problem. 我尝试解决stratum['Pts_per_game']部分，但找不到问题。

Answer 1

for stratum, position in strata.items(): means stratum will be a key and position will be the value. for stratum, position in strata.items():表示stratum将是键，而position将是值。 the keys in your dict are strings, so stratum is a string 字典中的键是字符串，所以stratum是字符串

try this: 尝试这个：

wnba['Pts_per_game'] = wnba['PTS']/wnba['Games Played']

positions = ['F', 'G', 'C', 'G/F', 'F/C']

strata = {position: wnba[wnba.Pos == position] for position in positions}

points_per_position = {}
for position, stratum in strata.items():
    sample = stratum['Pts_per_game'].sample(10,random_state=0)
    points_per_position[position]=sample.mean()

position_most_points=max(points_per_position,key= points_per_position.get)

What I changed: 我改变了什么：

the keys to the dict are now the positions themselves: 字典的关键现在是职位本身：
when iterating .items() I unpack the position first and stratum second 当迭代.items()我首先解压位置，然后解压层

错误-字符串索引必须是整数；使用pandas列作为字典的值

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-06-28 13:49:11

错误-字符串索引必须是整数； 使用pandas列作为字典的值

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-06-28 13:49:11

错误-字符串索引必须是整数；使用pandas列作为字典的值

解决方案1
1 已采纳 2019-06-28 13:49:11