子集參數在 pandas.io.formats.style.Styler.format 中有什么作用？

Question

pandas.io.formats.style.Styler.format的公共文檔說

子集：索引切片
DataFrame.loc一個參數，用於限制formatter應用於哪些元素。

但是看看代碼，這不太正確……這是什么_non_reducing_slice東西？

    if subset is None:
        row_locs = range(len(self.data))
        col_locs = range(len(self.data.columns))
    else:
        subset = _non_reducing_slice(subset)
        if len(subset) == 1:
            subset = subset, self.data.columns

        sub_df = self.data.loc[subset]

用例：我想格式化一個特定的行，但是當我天真地按照文檔使用.loc[]可以正常工作的內容時出現錯誤：

>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
   b  c
a
1  2  3
3  5  4
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
    if not self._has_valid_type(k, i):
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b    5
c    4
Name: 3, dtype: int64
>>> df.loc[[3]]
   b  c
a
3  5  4

好的，我嘗試使用IndexSlice並且它看起來很IndexSlice ——在某些情況下有效，在其他情況下不起作用，至少在 Pandas 0.20.3 中：

Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
             c
a      b
red    0     0
       1   100
       2   200
       3   300
green  0   400
       1   500
       2   600
       3   700
blue   0   800
       1   900
       2  1000
       3  1100
yellow 0  1200
       1  1300
       2  1400
       3  1500
>>> df.loc[idx['yellow']]
      c
b
0  1200
1  1300
2  1400
3  1500
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'

在熊貓 0.24.2 中，我得到了類似的錯誤，但略有不同：

>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 401, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'

哦等等——我沒有指定足夠的索引信息； 這有效：

df.style.format(J,idx['yellow',:])

Answer 1

它確實做了它應該做的。

df = pd.DataFrame(np.arange(16).reshape(4,4))

df.style.background_gradient(subset=[0,1])

df.style.background_gradient()

給出：

分別。

Answer 2

我同意你表現出的行為並不理想。

>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
                            dict(a=3,b=5,c=4)])
            .set_index('a'))
>>> df.loc[[3]]
   b  c
a      
3  5  4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"

您可以通過將完整pandas.IndexSlice作為子集參數傳遞來解決此問題：

>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])

由於您詢問_non_reducing_slice()正在做什么，它的目標是合理的（確保子集不會將維度降低到系列）。 它的實現將列表視為一系列列名：

從pandas/core/indexing.py ：

 def _non_reducing_slice(slice_): """ Ensurse that a slice doesn't reduce to a Series or Scalar. Any user-paseed `subset` should have this called on it to make sure we're always working with DataFrames. """ # default to column slice, like DataFrame # ['A', 'B'] -> IndexSlices[:, ['A', 'B']] kinds = (ABCSeries, np.ndarray, Index, list, str) if isinstance(slice_, kinds): slice_ = IndexSlice[:, slice_] ...

我想知道是否可以改進文檔：在這種情況下，使用subset=[3]引發的異常與df[[3]]而不是df.loc[[3]]的行為相匹配。

子集參數在 pandas.io.formats.style.Styler.format 中有什么作用？

問題描述

2 個解決方案

解決方案1
1 2019-12-05 20:53:57

解決方案2
1 已采納 2019-12-05 21:01:03

子集參數在 pandas.io.formats.style.Styler.format 中有什么作用？

問題描述

2 個解決方案

解決方案1 1 2019-12-05 20:53:57

解決方案2 1 已采納 2019-12-05 21:01:03

解決方案1
1 2019-12-05 20:53:57

解決方案2
1 已采納 2019-12-05 21:01:03