简体   繁体   中英

Is there a function (or a better way to) aggregate count the number of values using two columns with similar values in Pandas?

As shown in the image,

  • Two of the columns have the same kind of data.
  • Type1 is the primary type for a Pokemon and type2 is the secondary type for a Pokemon.
  • The same type like grass , ground , poison can appear in type1 column as well as typ2 column
  • For example for the first row ground is type1 but for second and third row ground is type2

Now, what I am trying to do is get all the pokemon with same type irrespective of type1 or type2 , for example, here count of ground would be 5 and poison would be 4 and so on(even if 4 ground appears in type2 and 1 in type one)

数据集

type2_count = {}
type_count = {}
for i in type1:
    type_count[i]=type_count.get(i,0)+1
for i in type2:
    type_count[i]=type_count.get(i,0)+1
print(type_count)

I am expecting the count for each type of pokemon (irrespective of type1 or type 2)

IIUC, you can use

# with numpy
type_counts = np.hstack(df[['type1', 'type2']].values)
type_counts  = dict(zip(*np.unique(type_counts , return_counts=True)))
print(type_counts)

# using pandas
print(df['type1'].append(df['type2']).value_counts().to_dict())

{'ground': 5, 'poison': 5, ....}

You could try this:

pd.Series(df.type1.to_list() + df.type2.to_list()).value_counts()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM