简体   繁体   English

熊猫找到最近轮廓的行

[英]Pandas find row with closest profile

I have a file full of profiles that look like this: 我有一个充满个人资料的文件,如下所示:

 profile_id  colA  colB  colC  colD
 1           1     20    50    63
 2           1     20    65    38
 3           8     5     3     4
 4           98    1     878   4
 ...

I have another CSV with results from which I want to find the profile: 我有另一个CSV结果,我想找到该配置文件:

col    value    score
colA   1        85
colA   1        856
colA   8        200000
colB   1        2356
colC   878      99999
colD   4        2
...

I want to extract the value for each colX with the best score and find which profile_id this is associated to in the previous file. 我想为每个colX提取最佳分数的value ,并找到与之前文件相关联的profile_id。

What I've done is working: 我所做的就是工作:

profiles = pd.read_csv("profiles.csv", sep="\t", index_col=False)
df = pd.read_csv("results.csv", sep="\t", index_col=False)

found_col = set(df["col"])
good_profile = profiles.copy()
for col in profiles.columns:
    if col == "profile_id":
        continue
    elif col not in found_col:
        print(f"{col} not found")
    else:
        value = int(df.loc[df[df["col"] == col]["score"].idxmax()].value)
        good_profile = good_profile[good_profile[col] == value]
 print(good_profile)

This give me the result I want but I am first extracting a subset for the first column, then a subset of this subset for the second one etc... 这给了我想要的结果,但我首先提取第一列的子集,然后为第二列提取该子集的子集等...

The cool thing with this is I also get a result when I miss some columns which is great. 这个很酷的事情是,当我错过一些很棒的专栏时,我也会得到一个结果。

I was wondering if there were a way for it to do it better without having to use to create subsets over the previous subset. 我想知道是否有办法让它更好地完成它,而不必使用在前一个子集上创建子集。

Here's my attempt: 这是我的尝试:

# extract the id with max scores
new_df = df2.loc[df2.groupby('col').score.idxmax(), ['col','value']]

# merge
new_df.merge(df1.melt(id_vars='profile_id', var_name='col'),
             on=['col','value'],
             how='left')

Output: 输出:

    col  value  profile_id
0  colA      8           3
1  colB      1           4
2  colC    878           4
3  colD      4           3
4  colD      4           4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM