简体   繁体   English

在 Pandas 中在另一列的末尾连接一列后获得列的平均值

[英]get the mean of colum after concatenating adding one column at the end of another in pandas

I have a dataset that looks like this :我有一个看起来像这样的数据集:

    Interactor A    Interactor B    Interaction Score   score2
0   P02574  P39205  0.928736    0.375000
1   P02574  Q6NR18  0.297354    0.166667
2   P02574  Q7KML4  0.297354    0.142857
3   P02574  Q9BP34  0.297354    0.166667
4   P02574  Q9BP35  0.297354    0.16666

data.shape = (112049, 5)

I want to add Interactor B at the end of Interactor A column uniquely and add a column that shows their Rank.我想在Interactor A列的末尾添加Interactor B并添加一个显示其排名的列。 I did this by :我这样做了:

cols = [data[col].squeeze() for col in data[['Interactor A','Interactor B']]]
n =pd.concat(cols, ignore_index=True)
n = pd.DataFrame(n,columns = ['AB'])

to make the column unique :使列唯一:

t = pd.unique(n['AB'])
t= pd.DataFrame(t, columns=[ "AB"])

then :然后 :

t2 = n.groupby(['AB'],sort=False).size()
t2 = pd.DataFrame(t2)

finally : by concatenating t2 and t :最后:通过连接 t2 和 t :

data_1 = pd.concat([t,l], axis=1)


AB  Rank
0   P02574  4


data.shape = (13631, 2)

now I want to add the Interaction Score and score2 column to DF .现在我想将Interaction Scorescore2列添加到 DF 。 if there is duplicate take the mean of their Interaction Score and delete the duplicates and replace the value of the Interaction Score by the mean.如果有重复,则取其Interaction Score的平均值并删除重复项并用平均值替换Interaction Score的值。

I used :我用了 :

score2 = data.groupby(['Interactor A','Interactor B'])['score2'].mean()
score2 = pd.DataFrame(score2, columns=['score2']) 

the output in this case is like :在这种情况下的输出是这样的:

        score2
Interactor A    Interactor B    
A0A023GPK8  Q9VQW1  0.200000
A0A076NAB7  Q9VYN8  0.000000
A0A0B4JD97  Q400N2  0.000000
Q9VC64  0.090909
Q9VNE4  0.307692

112049 rows × 1 columns

but what I is to add columns with mean of 'score2' and 'Interaction Score' column for 13631 unique data that I made.但是我要为我制作的 13631 个独特数据添加具有'score2''Interaction Score'列平均值的列。 How can achieve this ??怎么能做到这一点? please help.请帮忙。 the final df should be like :最终的 df 应该是这样的:

Interactor Rank Interaction Score score2 P02574 5 0.928736 0.44交互者排名 交互得分 score2 P02574 5 0.928736 0.44

ie: score2 is the average of all 'P0257' score that have been in the dataset即:score2 是数据集中所有“P0257”分数的平均值

IIUC - You simply need to reshape your data from wide to long and then run aggregation assuming scores pair with interactors one for one. IIUC - 您只需要将数据从宽到长重塑,然后假设分数交互者一对一配对运行聚合。 Consider wide_to_long for reshape after setting up stub names and id field.在设置存根名称和 id 字段后,考虑用wide_to_long进行wide_to_long Then, run groupby().agg() for counts and means.然后,运行groupby().agg()以获取计数和均值。

Data数据

from io import StringIO
import pandas as pd    

txt = '''    "Interactor A"    "Interactor B"    "Interaction Score"   "score2"
0   P02574  P39205  0.928736    0.375000
1   P02574  Q6NR18  0.297354    0.166667
2   P02574  Q7KML4  0.297354    0.142857
3   P02574  Q9BP34  0.297354    0.166667
4   P02574  Q9BP35  0.297354    0.16666'''

data = pd.read_csv(StringIO(txt), sep="\s+")

Reshape重塑

# FOR id FIELD
data["id"] = data.index

# FOR STUB NAMES
data = data.rename(columns={"Interaction Score": "score A",
                            "score2": "score B"})

df_long = pd.wide_to_long(data, ["Interactor", "score"], i="id", 
                           j="score_type", sep=" ", suffix="(A|B)")

df_long
#               Interactor     score
# id score_type                     
# 0  A              P02574  0.928736
# 1  A              P02574  0.297354
# 2  A              P02574  0.297354
# 3  A              P02574  0.297354
# 4  A              P02574  0.297354
# 0  B              P39205  0.375000
# 1  B              Q6NR18  0.166667
# 2  B              Q7KML4  0.142857
# 3  B              Q9BP34  0.166667
# 4  B              Q9BP35  0.166660

Interactor Aggregation交互者聚合

df_long.groupby(["Interactor"])["score"].agg(["count", "mean"])

#            count      mean
# Interactor
# P02574         5  0.423630
# P39205         1  0.375000
# Q6NR18         1  0.166667
# Q7KML4         1  0.142857
# Q9BP34         1  0.166667
# Q9BP35         1  0.166660

Interactor + Score Groupby Aggregation Interactor + Score Groupby 聚合

df_long.groupby(["Interactor", "score_type"])['score'].agg(["count", "mean"])

#                        count      mean
# Interactor score_type                 
# P02574     A               5  0.423630
# P39205     B               1  0.375000
# Q6NR18     B               1  0.166667
# Q7KML4     B               1  0.142857
# Q9BP34     B               1  0.166667
# Q9BP35     B               1  0.166660

Interactor + Score Pivot Aggregation Interactor + Score Pivot 聚合

df_long.pivot_table(index="Interactor", columns="score_type", values='score',
                    aggfunc = ["count", "mean"])

#            count          mean          
# score_type     A    B        A         B
# Interactor                              
# P02574       5.0  NaN  0.42363       NaN
# P39205       NaN  1.0      NaN  0.375000
# Q6NR18       NaN  1.0      NaN  0.166667
# Q7KML4       NaN  1.0      NaN  0.142857
# Q9BP34       NaN  1.0      NaN  0.166667
# Q9BP35       NaN  1.0      NaN  0.166660

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:分组多列,连接一列,同时添加另一列 - pandas: groupby multiple columns, concatenating one column while adding another 在熊猫中以均值取一列,然后以另一列取平均值 - Taking the mean by one column and then by another in pandas Pandas 如果列包含字符串,则从另一列获取唯一值并从 dataframe 中删除行 - Pandas if colum contains string then get unique value from another column and drop rows from dataframe Pandas 在一列中获取不在另一列中的值 - Pandas get values in one column that are not in another column 在熊猫中将两个花车串联成一列 - Concatenating two floats into one column in pandas 使用由另一列分组的另一列的总和创建一个新列 - Python Pandas - Create a new colum with the aggregation of the sum of another colum grouped by another column - Python Pandas 如何使用另一列的字典计数创建 pandas 列? - How to create a pandas colum with dictionary count from another column? 当第二列包含NaN /空字符串时,在pandas数据框中连接两列而不在末尾添加额外的空格 - Concatenating two columns in pandas dataframe without adding extra spaces at the end when the second column contains NaN/empty strings Pandas:map 列使用另一个列的 max() 值 - Pandas: map column using max() value of another colum 使用另一列的值来获取熊猫的平均值 - Using another column's value to get the mean in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM