[英]Pandas Group Data and Aggregate Variables and Find Duplicates
I have the following data:我有以下数据:
plate,part,posX,posY,rotation
1,FSHN01-R-58.stl,14.5,9.5,180
1,FSHN01-R-58.stl,14.5,9.5,180
1,FSHN01-E-2.stl,44.5,6.5,270
1,FSHN01-N-3.stl,88,7,0
2,FSHN01-N-7.stl,70.5,70.5,90
2,FSHN01-N-1.stl,128.5,64.5,180
2,FSHN01-N-1.stl,113.5,69.5,90
7,FSHN01-R-58.stl,14.5,9.5,180
7,FSHN01-R-58.stl,14.5,9.5,180
7,FSHN01-E-2.stl,44.5,6.5,270
7,FSHN01-N-3.stl,88,7,0
I want to group plates by part and find same plates.我想按部分对盘子进行分组并找到相同的盘子。 For example on this data 1,7 have same parts.例如,在此数据上,1,7 具有相同的部分。 posX, posY, rotation its not important for me and i need to find part counts of each plates. posX、posY、旋转对我来说并不重要,我需要找到每个板的零件数量。
For this, i wrote this code:为此,我写了这段代码:
import pandas as pd
df = pd.read_csv("public/33/stls/plates.csv")
result = df.groupby(['plate', 'part']).size()
print(result)
and i got.我得到了。
plate part
1 FSHN01-E-2.stl 1
FSHN01-N-3.stl 1
FSHN01-R-58.stl 2
2 FSHN01-N-1.stl 2
FSHN01-N-7.stl 1
7 FSHN01-E-2.stl 1
FSHN01-N-3.stl 1
FSHN01-R-58.stl 2
so, 1. and 7. plates are same, how can i drop 7. plate from table and increase 1. plate plate count variable.所以,1. 和 7. plates 是一样的,我怎样才能从表中删除 7. plate 并增加 1. plate plate count 变量。
i need this result;我需要这个结果;
plate part part count plate count
1 FSHN01-E-2.stl 1 2
FSHN01-N-3.stl 1 2
FSHN01-R-58.stl 2 2
2 FSHN01-N-1.stl 2 1
FSHN01-N-7.stl 1 1
IIUC, you need two groupby
, one with the default plate/part, and one in which you have substituted the plate values 7 by 1: IIUC,您需要两个groupby
,一个带有默认的板/零件,一个在其中将板值 7 替换为 1:
synonyms = {7: 1}
(df.groupby(['plate', 'part']).size().to_frame(name='part_count')
.join(df.assign(plate2=df['plate'].replace(synonyms))
.groupby(['plate2', 'part'])['plate'].agg(plate_count='nunique')
.rename_axis(['plate', 'part']),
how='inner'
)
.reset_index()
)
or in a more linear form:或者更线性的形式:
synonyms = {7: 1}
df2 = df.groupby(['plate', 'part']).size().to_frame(name='part_count')
df2['plate_count'] = (df.assign(plate2=df['plate'].replace(synonyms))
.groupby(['plate2', 'part'])['plate'].nunique()
)
df2 = df2.dropna().reset_index()
output: output:
plate part part_count plate_count
0 1 FSHN01-E-2.stl 1 2
1 1 FSHN01-N-3.stl 1 2
2 1 FSHN01-R-58.stl 2 2
3 2 FSHN01-N-1.stl 2 1
4 2 FSHN01-N-7.stl 1 1
NB.注意。 Note that you voluntarily lose the information on the part counts of plate 7 here.请注意,您在这里自愿丢失了板 7 的零件计数信息。 If you want to keep it, map 7 to 1 as shown above and perform a single GroupBy.agg
如果你想保留它,map 7比1如上图并执行单个GroupBy.agg
import pandas as pd
df = pd.read_csv("plates.csv")
df = df.groupby(['plate', 'part']).size().reset_index(name='count')
df_dict = df.groupby('plate')[['part', 'count']].apply(lambda x: x.to_dict('records')).to_dict()
result, plate_count = {}, {}
for key, value in df_dict.items():
if value not in result.values():
result[key] = value
plate_count[key] = 1
else:
plate_count[list(result.keys())[list(result.values()).index(value)]] += 1
print(result)
print(plate_count)
and the output I got:和我得到的 output:
{1: [{'part': 'FSHN01-E-2.stl', 'count': 1}, {'part': 'FSHN01-N-3.stl', 'count': 1}, {'part': 'FSHN01-R-58.stl', 'count': 2}], 2: [{'part': 'FSHN01-N-1.stl', 'count': 2}, {'part': 'FSHN01-N-7.stl', 'count': 1}]}
{1: 2, 2: 1}
I found a way for fix this issue but i could not fix my problem with pandas and i used dict.我找到了解决此问题的方法,但我无法解决 pandas 的问题,我使用了 dict。 I wonder about the better way, anyone can find it?我想知道更好的方法,任何人都可以找到它吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.