简体   繁体   中英

Set first and last row of a column in a dataframe

I've been reading over this and still find the subject a little confusing : http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Say I have a Pandas DataFrame and I wish to simultaneously set the first and last row elements of a single column to whatever value. I can do this :

df.iloc[[0, -1]].mycol = [1, 2]

which tells me A value is trying to be set on a copy of a slice from a DataFrame. and that this is potentially dangerous.

I could use .loc instead, but then I need to know the index of the first and last rows ( in constrast, .iloc allows me to access by location ).

What's the safest Pandasy way to do this ?

To get to this point :

# Django queryset
query = market.stats_set.annotate(distance=F("end_date") - query_date)

# Generate a dataframe from this queryset, and order by distance
df = pd.DataFrame.from_records(query.values("distance", *fields), coerce_float=True)
df = df.sort_values("distance").reset_index(drop=True)

Then, I try calling df.distance.iloc[[0, -1]] = [1, 2] . This raises the warning.

The issue isn't with iloc , it's when you access .mycol that a copy is created. You can do this all within iloc :

df.iloc[[0, -1], df.columns.get_loc('mycol')] = [1, 2]

Usually ix is used if you want mixed integer and label based access, but doesn't work in this case since -1 isn't actually in the index, and apparently ix isn't smart enough to know it should be the last index.

What you're doing is called chained indexing, you can use iloc just on that column to avoid the warning:

In [24]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))

Out[24]:
          a         b         c
0  1.589940  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2  1.123221 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4  0.344794  1.095861 -1.300339

In [25]:
df['a'].iloc[[0,-1]] ='foo'
df

Out[25]:
          a         b         c
0       foo  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2   1.12322 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4       foo  1.095861 -1.300339

If you do it the other way then it raises the warning:

In [27]:
df.iloc[[0,-1]]['a'] ='foo'

C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\IPython\kernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM