I have two pd.DataFrame
objects (read from .csv file), say,
1, 2
1, 3
2, 4
and
2, 1
1, 2
3, 3
Suppose the DataFrame
's are named as data1
and data2
. So I can easily count the number of unique values in each column of data1
and data2
individually using
uniques = data.apply(pd.Series.nunique)
data
is replaced by data1
and data2
respectively. So I will get 2, 3
for data1
and 3, 3
for data2
. Is there a way (other than concatenating the DataFrame
's) so that I can get the number of unique values when these two DataFrame
's are combined? I want to get 3, 4
.
I think not. Need concat
first:
df = pd.concat([df1,df2]).apply(pd.Series.nunique)
print (df)
a 3
b 4
dtype: int64
#use numpy unique to count uninues after combining same columns from both DF.
len(np.unique(np.c_[df1.iloc[:,0],df2.iloc[:,0]]))
Out[1398]: 3
len(np.unique(np.c_[df1.iloc[:,1],df2.iloc[:,1]]))
Out[1399]: 4
Another alternative that will work for any number of data frames:
dfs = [df1, df2]
print([
len(set(np.concatenate([df[colname].unique() for df in dfs])))
for colname in dfs[0]
])
[3, 4]
Note that this will only work if all the data frames have the same column names.
I think that concat
is the best option, unless your data frames already fill your local memory: concatenating will copy
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.