简体   繁体   English

2 行或更多行中累积值的唯一计数 pandas

[英]unique count for accumulative values in 2 or more rows pandas

Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe?是否可以对 dataframe 中的 2(或多)行的值进行唯一计数? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1) of the first 6 columns of individual.我能够使用df['count'] = df.iloc[:, 0:6].nunique(axis=1)个人的前 6 列进行唯一计数。 However, i can't figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.但是,我无法弄清楚(或找到)如何获取两行(或多行)中 6 列的唯一计数。

original df: there are 3 unique values in each row: 7,4,2 and 8,5,6原始df:每行有3个唯一值:7、4、2和8、5、6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║     3 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║     3 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝

df trying to get: there are 6 unique values for both rows: 7,4,2,8,5,6 df 试图获取:两行都有 6 个唯一值:7,4,2,8,5,6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count  ║ count2rows ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║      3 ║          6 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║      3 ║          6 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝

code for sample df:示例df的代码:

import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)

df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df

Thanks in advance提前致谢

You could use Numpy to get that done.您可以使用Numpy来完成这项工作。

import numpy as np

df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)

Result结果

   hf0  hf1  hf2  hf3  hf4  hf5 sample  count  count3rows
0    7    4    2    2    7    2    7yr      3           6
1    8    5    5    6    5    6    7yr      3           6
df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample
0    7    4    2    2    7    2    7yr
1    8    5    5    6    5    6    7yr
2    9    6    8   10    7   10    7yr
3   10    7   11   14    5   14    7yr
4   11    8   14   18    7   18    7yr



rolling window rolling=2滚动 window rolling=2

ar = df.loc[:,'hf0':'hf5'].to_numpy()
rolling = 2
length = ar.shape[1]

cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length))
plan = cubic.reshape(-1,rolling*length).astype(float)

head_arr = plan[0,:length]
head_arr = np.pad(head_arr.astype(float), (0,length), 'constant', constant_values=np.nan)

plan = np.insert(plan, 0, head_arr, axis=0)
df['count2rows'] = pd.DataFrame(plan).nunique(axis=1)
df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample  count2rows
0    7    4    2    2    7    2    7yr           3
1    8    5    5    6    5    6    7yr           6
2    9    6    8   10    7   10    7yr           6
3   10    7   11   14    5   14    7yr           8
4   11    8   14   18    7   18    7yr           7

You could use stack()你可以使用stack()

df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM