简体   繁体   English

为什么pandas DataFrame中的列在此循环中不起作用?

[英]Why does a column from pandas DataFrame not work in this loop?

I have a DataFrame that I took from basketball-reference with player names. 我有一个数据框,该数据框是从篮球运动员的名字中引用的。 The code below is how I built the DataFrame. 下面的代码是我构建DataFrame的方式。 It has 5 columns of player names, but each name also has the player's position. 它有5列播放器名称,但每个名称也都有播放器的位置。

url = "http://www.basketball-reference.com/awards/all_league.html"
dframe_list = pd.io.html.read_html(url)
df = dframe_list[0]
df.drop(df.columns[[0,1,2]], inplace=True, axis=1)
column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
df.columns = column_names
df = df[df.name1.notnull()]

I am trying to split off the position. 我正试图分开这个职位。 To do so I had planned to make a DataFrame for each name column: 为此,我计划为每个名称列创建一个DataFrame:

name1 = pd.DataFrame(df.name1.str.split().tolist()).ix[:,0:1]
name1[0] = name1[0] + " " + name1[1]
name1.drop(name1.columns[[1]], inplace=True, axis=1)

Since I have five columns I thought I would do this with a loop 由于我有五列,我想我会循环执行此操作

column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
for column in column_names:
    column = pd.DataFrame(df.column.str.split().tolist()).ix[:,0:1]
    column[0] = column[0] + " " + column[1]
    column.drop(column.columns[[1]], inplace=True, axis=1)
    column.columns = column

And then I'd join all these DataFrames back together. 然后,我将所有这些DataFrame重新结合在一起。

df_NBA = [name1, name2, name3, name4, name5]
df_NBA = pd.concat(df_NBA, axis=1)

I'm new to python, so I'm sure I'm doing this in a pretty cumbersome fashion and would love suggestions as to how I might do this faster. 我是python的新手,所以我确定我正在以一种非常繁琐的方式进行此操作,并且希望提出有关如何更快地执行此操作的建议。 But my main question is, when I run the code on individual columns it works fine, but if when I run the loop I get the error: 但是我的主要问题是,当我在各个列上运行代码时,它可以正常工作,但是如果运行循环时,则会出现错误:

AttributeError: 'DataFrame' object has no attribute 'column'

It seems that the part of the loop df.column.str is causing some problem? 似乎循环df.column.str的一部分引起了某些问题? I've fiddled around with the list, with bracketing column (I still don't understand why sometimes I bracket a DataFrame column and sometimes it's .column, but that's a bigger issue) and other random things. 我一直在用括号括起来的列表弄乱列表(我仍然不明白为什么有时我将DataFrame列放在括号中,有时是.column,但这是一个更大的问题)和其他随机内容。

When I try @BrenBarn's suggestion 当我尝试@BrenBarn的建议时

df.apply(lambda c: c.str[:-2])

The following pops up in the Jupyter notebook: Jupyter笔记本中弹出以下内容:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:    http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

Looking at the DataFrame, nothing has actually changed and if I understand the documentation correctly this method creates a copy of the DataFrame with the edits, but that this is a temporary copy that get's thrown out afterward so the actual DataFrame doesn't change. 看一下DataFrame,实际上并没有任何改变,如果我正确地理解了文档,此方法将创建带有编辑内容的DataFrame副本,但这是一个临时副本,之后将其丢弃,因此实际的DataFrame不会更改。

If the position labels are always only one character, the simple solution is this: 如果位置标签始终仅是一个字符,则简单的解决方案是:

>>> df.apply(lambda c: c.str[:-2])
           name1         name2
0     Marc Gasol  Lebron James
1      Pau Gasol  Kevin Durant
2  Dwight Howard  Kyrie Irving

The str attribute of a Series lets you do string operations, including indexing, so this just trims the last two characters off each value. Series的str属性使您可以进行字符串操作(包括索引编制),因此这只会将每个值的最后两个字符修剪掉。

As for your question about df.column , this issue is more general than pandas. 至于关于df.column的问题,这个问题比熊猫要笼统。 These two things are not the same: 这两件事是不一样的:

# works
obj.attr

# doesn't work
attrName = 'attr'
obj.attrName

You can't use the dot notation when you want to access an attribute whose name is stored in a variable. 要访问名称存储在变量中的属性,则不能使用点号。 In general, you can use the getattr function instead. 通常,您可以改用getattr函数。 However, pandas provides the bracket notation for accessing a column by specifying the name as a string (rather than a source-code identifier). 但是,pandas通过将名称指定为字符串 (而不是源代码标识符)来提供用于访问列的括号符号。 So these two are equivalent: 因此,这两个是等效的:

df.some_column

columnName = "some_column"
df[columnName]

In your example, changing your reference to df.column to df[column] should resolve that issue. 在您的示例中, df.column的引用df.columndf[column]应该可以解决该问题。 However, as I mentioned in a comment, your code has other problems too. 但是,正如我在评论中提到的那样,您的代码也存在其他问题。 As far as solving the task at hand, the string-indexing approach I showed at the beginning of my answer is much simpler. 就解决手头的任务而言,我在回答之初显示的字符串索引方法要简单得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM