I am trying to match rows and aggregate them in a single row.
For example for the table below, I want to aggregate the first three rows because they are similar. 4th isnt similar. In my check, I do nothing for any row that has col 1 as B. And then again aggregation for final two rows:
|---------------------|------------------|------------------|
| Col 1 | Col 2 | Col 3 |
|---------------------|------------------|------------------|
| A | 12st | 13 |
|---------------------|------------------|------------------|
| A | 12st | 13 |
|---------------------|------------------|------------------|
| A | 12st | 13 |
|---------------------|------------------|------------------|
| A | 12st | 17 |
|---------------------|------------------|------------------|
| B | 11aa | 10 |
|---------------------|------------------|------------------|
| C | 10ee | 10 |
|---------------------|------------------|------------------|
| C | 10ee | 10 |
|---------------------|------------------|------------------|
df = pd.DataFrame({'Col 1': ['A', 'A', 'A','A', 'B', 'C', 'C'],'Col 2': ['12st', '12st', '12st', '12st', '11aa' ,'10ee','10ee'],'Col 3': [13, 13, 13, 17, 10, 10, 10 ]})
I want to get the following output:
|---------------------|------------------|------------------|---------------|
| Col 1 | Col 2 | Col 3 | Col 4 |
|---------------------|------------------|------------------|---------------|
| A | 12st | 13 | 3 |
|---------------------|------------------|------------------|---------------|
| A | 12st | 17 | 1 |
|---------------------|------------------|------------------|---------------|
| B | 11a | 10 | 1 |
|---------------------|------------------|------------------|---------------|
| C | 10ee | 10 | 2 |
|---------------------|------------------|------------------|---------------|
I have tried simpler things like df.shift() but that seems to only work for a specific col and not row. Plus I want to do this iteratively for the rows (i) it keeps on matching (i==i+1==i+2).
Thanks
I think groupby.size
can do it like:
print (df.groupby(['Col 1','Col 2', 'Col 3']).size().reset_index(name='Col 4'))
Col 1 Col 2 Col 3 Col 4
0 A 12st 13 3
1 A 12st 17 1
2 B 11aa 10 1
3 C 10ee 10 2
I think you probably do something like this:
output_data = []
for i,row in range(rows):
current_row = df.iloc[i]
try:
# check if row is in output_data
output_data.index(current_row)
except:
output_data.append(current_row)
# Create a new dataframe
new_df = pd.DataFrame(output_data)
Please let me know if this helps: :D Thanks!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.