I have a table that looks like below
Loc | ID | filter | P1 |
---|---|---|---|
A | ABC1 | GHY | 55.6 |
A | DFT1 | FGH | 67.8 |
B | HJH5 | GHY | 67 |
C | HKL | BHY | 78 |
B | GTY | FGH | 60 |
I want the output as below. Basically, I want the records with the same Filter to be one row
Filter | ID | Loc | P1 | m_ID | m_Loc | m_p1 | total |
---|---|---|---|---|---|---|---|
GHY | ABC1 | A | 55.6 | HJH5 | B | 67 | 122.6 |
FGH | DFT1 | A | 67.8 | GTY | B | 60 | 127.8 |
Is this achievable using itertools i python. If yes can someone please suggest how can we do it?
Here's a solution using lead
and row_number
that I think is a little nicer.
select filter
,id
,loc
,p1
,m_id
,m_loc
,m_p1
from
(with t2 as
(select row_number () over( partition by filter order by filter desc) as rn
,*
from t)
select rn,filter, id, loc, p1
,lead(id) over( partition by filter order by filter) as m_id
,lead(loc) over( partition by filter order by filter) as m_loc
,lead(p1) over( partition by filter order by filter) as m_p1
from t2) t
where rn=1
filter | id | loc | p1 | m_id | m_loc | m_p1 |
---|---|---|---|---|---|---|
BHY | HKL | C | 78 | null | null | null |
FGH | DFT1 | A | 67.8 | GTY | B | 60 |
GHY | ABC1 | A | 55.6 | HJH5 | B | 67 |
There should be a better solution to this question, but here is a solution that's based on what you did. I used left join
to not lose filters that only appear once and then I used group by
to consolidate the results.
select t1.filter
,max(t1.id) as id
,max(t1.loc) as loc
,max(t1.p1) as p1
,min(t2.id) as m_id
,min(t2.loc) as m_loc
,min(t2.p1) as m_p1
from t as t1 left join t as t2 on t2.filter = t1.filter and t2.id <> (t1.id)
group by t1.filter
filter | id | loc | p1 | m_id | m_loc | m_p1 |
---|---|---|---|---|---|---|
BHY | HKL | C | 78 | null | null | null |
FGH | GTY | B | 67.8 | DFT1 | A | 60 |
GHY | HJH5 | B | 67 | ABC1 | A | 55.6 |
If the usage of pandas is possible, you can achive a flexible solutiion with the following:
Definition of the data:
df=pd.DataFrame({'Loc': {0: 'A', 1: 'A', 2: 'B ', 3: 'C', 4: 'B'},
'ID': {0: 'ABC1', 1: 'DFT1', 2: 'HJH5', 3: 'HKL', 4: 'GTY'},
'filter': {0: 'GHY', 1: 'FGH', 2: 'GHY', 3: 'BHY', 4: 'FGH'},
'P1': {0: 55.6, 1: 67.8, 2: 67.0, 3: 78.0, 4: 60.0}})
Creation of the repetive columns:
cols=["{}_{}".format(N, c) for N in range(0,df.groupby('filter').count()['ID'].max()) for c in df.columns]
Here, I first find the maximum required repitions by looking for the max occurences of each filter df.groupby('filter').count()['ID'].max()
. The remaining code is just formating by adding a leading number.
Creation of new dataframe with filter
as index and the generated columns cols
as columns
df_new=pd.DataFrame(index=set(df['filter']), columns=cols)
Now we have to fill in the data:
for fil in df_new.index:
values=[val for row in df[df['filter']==fil].values for val in row]
df_new.loc[fil,df_new.columns[:len(values)]]=values
Here two things are done: First, the selected values based on the filter name fil
are flattend by [val for row in df[df['filter']==fil].values for val in row]
. Then, these values are filled into the dataframe starting at the left.
The result is as expected:
0_Loc 0_ID 0_filter 0_P1 1_Loc 1_ID 1_filter 1_P1
GHY A ABC1 GHY 55.6 B HJH5 GHY 67.0
BHY C HKL BHY 78.0 NaN NaN NaN NaN
FGH A DFT1 FGH 67.8 B GTY FGH 60.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.