I have a .csv file (600 lines) with some field: commit id, smell type and more.
I would count the occourrences of each type of smell for each commit id.
example of output i wouuld:
commit dfbu3u4498fbbefi: [dense structure :1, cyclic dependency:4, unstable dependency: 67, feature concentration: 6, god component: 8]
commit bifueifyuwefbvwr: [dense structure :34, cyclic dependency:43, unstable dependency: 97, feature concentration: 43, god component: 10]
I tried with this but i think I need another loop (maybe?) Sorry, I never used Python before
import csv
import collections
smell = collections.Counter()
with open('Ref.csv') as file:
reader = csv.reader(file, delimiter=';')
for row in reader:
smell[row[0]] += 1
print (smell.most_common(5))
OUTPUT:
[('9b0dd5dc979bd490ae34f6d790c466b47c84c920', 96), ('6431099fe7d5d90da678a78051f12894da82c68d', 96), ('44fdfa7ea93c15bb116a25e0675d98469deafaa6', 96), ('b2c40612a2c60685555f35af71f5801391a58b4b', 96), ('aa6cbb78cca17a9de339b2d060c00352e8beedde', 96)]
or if i change row index to 2 i got
[('Unstable Dependency', 315), ('Feature Concentration', 238), ('God Component', 84), ('Cyclic Dependency', 28), ('Dense Structure', 7)]
You can use pandas
to do it:
import pandas as pd
# Dataframe definition
df = pd.read_csv('Ref.csv', sep=';')
# Group and get the count values.
df_grouped = df.groupby(by=['commit', 'smell']).size()
df_grouped
is now a pandas.series
, if you want it to be a dataframe
again you should do this:
df_grouped = df_grouped.reset_index()
df_grouped = df_grouped.rename(columns={0: "counts"})
I highly recommend you to have a look at the documentation: https://pandas.pydata.org/pandas-docs/stable/index.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.