两个pandas数据帧中列中的唯一值数

Question

I have two pd.DataFrame objects (read from .csv file), say, 我有两个pd.DataFrame对象（从.csv文件中读取），比方说，

1, 2
1, 3
2, 4

and 和

2, 1
1, 2
3, 3

Suppose the DataFrame 's are named as data1 and data2 . 假设DataFrame被命名为data1和data2 。 So I can easily count the number of unique values in each column of data1 and data2 individually using 因此，我可以轻松地使用每个data1和data2列中的唯一值的数量

 uniques = data.apply(pd.Series.nunique)

data is replaced by data1 and data2 respectively. data分别由data1和data2替换。 So I will get 2, 3 for data1 and 3, 3 for data2 . 因此2, 3对于data1 ，我将获得2, 3对于data2 2, 3我将获得3, 3 。 Is there a way (other than concatenating the DataFrame 's) so that I can get the number of unique values when these two DataFrame 's are combined? 有没有办法（除了连接DataFrame ），以便在组合这两个DataFrame时可以得到唯一值的数量？ I want to get 3, 4 . 我想要3, 4 。

Answer 1

I think not. 我想不是。 Need concat first: 首先需要concat ：

df = pd.concat([df1,df2]).apply(pd.Series.nunique)
print (df)
a    3
b    4
dtype: int64

Answer 2

#use numpy unique to count uninues after combining same columns from both DF.

len(np.unique(np.c_[df1.iloc[:,0],df2.iloc[:,0]]))
Out[1398]: 3

len(np.unique(np.c_[df1.iloc[:,1],df2.iloc[:,1]]))
Out[1399]: 4

Answer 3

Another alternative that will work for any number of data frames: 另一种适用于任意数量数据帧的替代方案：

dfs = [df1, df2]
print([
    len(set(np.concatenate([df[colname].unique() for df in dfs])))
    for colname in dfs[0]
])
[3, 4]

Note that this will only work if all the data frames have the same column names. 请注意，这仅在所有数据框具有相同列名时才有效。

I think that concat is the best option, unless your data frames already fill your local memory: concatenating will copy 我认为concat是最好的选择，除非你的数据框已经填满你的本地内存：连接将复制

两个pandas数据帧中列中的唯一值数

问题描述

3 个解决方案

解决方案1
1 2017-05-19 05:52:31

解决方案2
1 已采纳 2017-05-19 06:01:06

解决方案3
1 2017-05-19 09:56:52

两个pandas数据帧中列中的唯一值数

问题描述

3 个解决方案

解决方案1 1 2017-05-19 05:52:31

解决方案2 1 已采纳 2017-05-19 06:01:06

解决方案3 1 2017-05-19 09:56:52

解决方案1
1 2017-05-19 05:52:31

解决方案2
1 已采纳 2017-05-19 06:01:06

解决方案3
1 2017-05-19 09:56:52