[英]Number of unique values in columns in two pandas dataframe
I have two pd.DataFrame
objects (read from .csv file), say, 我有两个
pd.DataFrame
对象(从.csv文件中读取),比方说,
1, 2
1, 3
2, 4
and 和
2, 1
1, 2
3, 3
Suppose the DataFrame
's are named as data1
and data2
. 假设
DataFrame
被命名为data1
和data2
。 So I can easily count the number of unique values in each column of data1
and data2
individually using 因此,我可以轻松地使用每个
data1
和data2
列中的唯一值的数量
uniques = data.apply(pd.Series.nunique)
data
is replaced by data1
and data2
respectively. data
分别由data1
和data2
替换。 So I will get 2, 3
for data1
and 3, 3
for data2
. 因此
2, 3
对于data1
,我将获得2, 3
对于data2
2, 3
我将获得3, 3
。 Is there a way (other than concatenating the DataFrame
's) so that I can get the number of unique values when these two DataFrame
's are combined? 有没有办法(除了连接
DataFrame
),以便在组合这两个DataFrame
时可以得到唯一值的数量? I want to get 3, 4
. 我想要
3, 4
。
#use numpy unique to count uninues after combining same columns from both DF.
len(np.unique(np.c_[df1.iloc[:,0],df2.iloc[:,0]]))
Out[1398]: 3
len(np.unique(np.c_[df1.iloc[:,1],df2.iloc[:,1]]))
Out[1399]: 4
Another alternative that will work for any number of data frames: 另一种适用于任意数量数据帧的替代方案:
dfs = [df1, df2]
print([
len(set(np.concatenate([df[colname].unique() for df in dfs])))
for colname in dfs[0]
])
[3, 4]
Note that this will only work if all the data frames have the same column names. 请注意,这仅在所有数据框具有相同列名时才有效。
I think that concat
is the best option, unless your data frames already fill your local memory: concatenating will copy 我认为
concat
是最好的选择,除非你的数据框已经填满你的本地内存: 连接将复制
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.