[英]How to generate permutations without repetition
I have a table that looks like below我有一张如下所示的表格
Loc ![]() |
ID ![]() |
filter![]() |
P1 ![]() |
---|---|---|---|
A![]() |
ABC1 ![]() |
GHY ![]() |
55.6 ![]() |
A![]() |
DFT1 ![]() |
FGH![]() |
67.8 ![]() |
B![]() |
HJH5 ![]() |
GHY ![]() |
67 ![]() |
C ![]() |
HKL ![]() |
BHY ![]() |
78 ![]() |
B![]() |
GTY ![]() |
FGH![]() |
60 ![]() |
I want the output as below.我想要 output 如下。 Basically, I want the records with the same Filter to be one row
基本上,我希望具有相同过滤器的记录为一行
Filter![]() |
ID ![]() |
Loc ![]() |
P1 ![]() |
m_ID![]() |
m_Loc ![]() |
m_p1 ![]() |
total![]() |
---|---|---|---|---|---|---|---|
GHY ![]() |
ABC1 ![]() |
A![]() |
55.6 ![]() |
HJH5 ![]() |
B![]() |
67 ![]() |
122.6 ![]() |
FGH![]() |
DFT1 ![]() |
A![]() |
67.8 ![]() |
GTY ![]() |
B![]() |
60 ![]() |
127.8 ![]() |
Is this achievable using itertools i python.这是否可以使用 itertools i python 来实现。 If yes can someone please suggest how can we do it?
如果是的话,有人可以建议我们怎么做吗?
Here's a solution using lead
and row_number
that I think is a little nicer.这是一个使用
lead
和row_number
的解决方案,我认为它更好一些。
select filter
,id
,loc
,p1
,m_id
,m_loc
,m_p1
from
(with t2 as
(select row_number () over( partition by filter order by filter desc) as rn
,*
from t)
select rn,filter, id, loc, p1
,lead(id) over( partition by filter order by filter) as m_id
,lead(loc) over( partition by filter order by filter) as m_loc
,lead(p1) over( partition by filter order by filter) as m_p1
from t2) t
where rn=1
filter![]() |
id ![]() |
loc![]() |
p1 ![]() |
m_id![]() |
m_loc ![]() |
m_p1 ![]() |
---|---|---|---|---|---|---|
BHY ![]() |
HKL ![]() |
C ![]() |
78 ![]() |
null ![]() |
null ![]() |
null ![]() |
FGH![]() |
DFT1 ![]() |
A![]() |
67.8 ![]() |
GTY ![]() |
B![]() |
60 ![]() |
GHY ![]() |
ABC1 ![]() |
A![]() |
55.6 ![]() |
HJH5 ![]() |
B![]() |
67 ![]() |
There should be a better solution to this question, but here is a solution that's based on what you did.这个问题应该有更好的解决方案,但这里有一个基于您所做的解决方案。 I used
left join
to not lose filters that only appear once and then I used group by
to consolidate the results.我使用
left join
来不丢失只出现一次的过滤器,然后我使用group by
来合并结果。
select t1.filter
,max(t1.id) as id
,max(t1.loc) as loc
,max(t1.p1) as p1
,min(t2.id) as m_id
,min(t2.loc) as m_loc
,min(t2.p1) as m_p1
from t as t1 left join t as t2 on t2.filter = t1.filter and t2.id <> (t1.id)
group by t1.filter
filter![]() |
id ![]() |
loc![]() |
p1 ![]() |
m_id![]() |
m_loc ![]() |
m_p1 ![]() |
---|---|---|---|---|---|---|
BHY ![]() |
HKL ![]() |
C ![]() |
78 ![]() |
null ![]() |
null ![]() |
null ![]() |
FGH![]() |
GTY ![]() |
B![]() |
67.8 ![]() |
DFT1 ![]() |
A![]() |
60 ![]() |
GHY ![]() |
HJH5 ![]() |
B![]() |
67 ![]() |
ABC1 ![]() |
A![]() |
55.6 ![]() |
If the usage of pandas is possible, you can achive a flexible solutiion with the following:如果可以使用 pandas,您可以通过以下方式实现灵活的解决方案:
Definition of the data:数据定义:
df=pd.DataFrame({'Loc': {0: 'A', 1: 'A', 2: 'B ', 3: 'C', 4: 'B'},
'ID': {0: 'ABC1', 1: 'DFT1', 2: 'HJH5', 3: 'HKL', 4: 'GTY'},
'filter': {0: 'GHY', 1: 'FGH', 2: 'GHY', 3: 'BHY', 4: 'FGH'},
'P1': {0: 55.6, 1: 67.8, 2: 67.0, 3: 78.0, 4: 60.0}})
Creation of the repetive columns:重复列的创建:
cols=["{}_{}".format(N, c) for N in range(0,df.groupby('filter').count()['ID'].max()) for c in df.columns]
Here, I first find the maximum required repitions by looking for the max occurences of each filter df.groupby('filter').count()['ID'].max()
.在这里,我首先通过查找每个过滤器
df.groupby('filter').count()['ID'].max()
的最大出现次数来找到所需的最大重复次数。 The remaining code is just formating by adding a leading number.剩下的代码只是通过添加一个前导数字来格式化。
Creation of new dataframe with filter
as index and the generated columns cols
as columns创建新的 dataframe,
filter
作为索引,生成的列cols
作为列
df_new=pd.DataFrame(index=set(df['filter']), columns=cols)
Now we have to fill in the data:现在我们必须填写数据:
for fil in df_new.index:
values=[val for row in df[df['filter']==fil].values for val in row]
df_new.loc[fil,df_new.columns[:len(values)]]=values
Here two things are done: First, the selected values based on the filter name fil
are flattend by [val for row in df[df['filter']==fil].values for val in row]
.这里做了两件事:首先,基于过滤器名称
fil
选择的值被[val for row in df[df['filter']==fil].values for val in row]
。 Then, these values are filled into the dataframe starting at the left.然后,这些值从左侧开始填充到 dataframe 中。
The result is as expected:结果如预期:
0_Loc 0_ID 0_filter 0_P1 1_Loc 1_ID 1_filter 1_P1
GHY A ABC1 GHY 55.6 B HJH5 GHY 67.0
BHY C HKL BHY 78.0 NaN NaN NaN NaN
FGH A DFT1 FGH 67.8 B GTY FGH 60.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.