如何遍歷具有已排序數字索引的數據框唯一行的列值，並在熊貓中進行重復？

Question

我有一個帶有重復的排序數字索引的pandas DataFrame ，對於給定列中相同索引值，列值相同。 我想遍歷給定列的值以獲取索引的唯一值。

例

df = pd.DataFrame({'a': [3, 3, 5], 'b': [4, 6, 8]}, index=[1, 1, 2])

   a  b
1  3  4
1  3  6
2  5  8

我想遍歷索引a [3,5]唯一條目的a列中的值。

當我使用默認index進行迭代並打印a列的類型時，我得到了重復索引條目的Series條目。

for i in df.index:
    cell_value = df['a'].loc[i]
    print(type(cell_value))

輸出：

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'numpy.int64'>

Answer 1

首先通過面罩刪除重復的指標，並指定由位置arange ，然后選擇iloc ：

arr = np.arange(len(df.index))
a = arr[~df.index.duplicated()]
print (a)
[0 2]

for i in a:
    cell_value = df['a'].iloc[i]
    print(type(cell_value))

<class 'numpy.int64'>
<class 'numpy.int64'>

無循環解決方案-將boolean indexing與~和duplicated和反轉掩碼一起使用：

a = df.loc[~df.index.duplicated(), 'a']
print (a)
1    3
2    5
Name: a, dtype: int64

b = df.loc[~df.index.duplicated(), 'a'].tolist()
print (b)
[3, 5]

print (~df.index.duplicated())
[ True False  True]

Answer 2

試試np.unique ：

_, i = np.unique(df.index, return_index=True)
df.iloc[i, df.columns.get_loc('a')].tolist() 

[3, 5]

Answer 3

如果按照您的評論，如果相同的索引表示相同的數據，則這似乎是XY問題。

您也不需要為此循環。

假設您要刪除重復的行並僅提取第一列（即3、5），則下面的內容就足夠了。

res = df.drop_duplicates().loc[:, 'a']

# 1    3
# 2    5
# Name: a, dtype: int64

要返回類型：

types = list(map(type, res))

print(types)
# [<class 'numpy.int64'>, <class 'numpy.int64'>]

Answer 4

另一種使用groupby的解決方案並應用：

df.groupby(level=0).apply(lambda x: type(x.a.iloc[0]))
Out[330]: 
1    <class 'numpy.int64'>
2    <class 'numpy.int64'>
dtype: object

為了使您的循環解決方案能夠正常工作，請創建一個臨時df：

df_new = df.groupby(level=0).first()
for i in df_new.index:
    cell_value = df_new['a'].loc[i]
    print(type(cell_value))

<class 'numpy.int64'>
<class 'numpy.int64'>

或使用drop_duplicates（）

for i in df.drop_duplicates().index:
    cell_value = df.drop_duplicates()['a'].loc[i]
    print(type(cell_value))

<class 'numpy.int64'>
<class 'numpy.int64'>

如何遍歷具有已排序數字索引的數據框唯一行的列值，並在熊貓中進行重復？

問題描述

4 個解決方案

解決方案1
2 已采納 2018-03-06 12:32:15

解決方案2
2 2018-03-06 12:40:06

解決方案3
0 2018-03-06 12:33:00

解決方案4
0 2018-03-06 12:43:55

如何遍歷具有已排序數字索引的數據框唯一行的列值，並在熊貓中進行重復？

問題描述

4 個解決方案

解決方案1 2 已采納 2018-03-06 12:32:15

解決方案2 2 2018-03-06 12:40:06

解決方案3 0 2018-03-06 12:33:00

解決方案4 0 2018-03-06 12:43:55

解決方案1
2 已采納 2018-03-06 12:32:15

解決方案2
2 2018-03-06 12:40:06

解決方案3
0 2018-03-06 12:33:00

解決方案4
0 2018-03-06 12:43:55