简体   繁体   English

有什么方法可以根据哪些行在 Python 中的一个或多个列中具有相似的值来选择表中的某些行?

[英]Is there any way to select certain rows in a table based on which ones have similar values in one or more columns in Python?

Sr. No.先生。 A一种 B C C
0 0 84.3 84.3 18.3 18.3 1.138420e+00 1.138420e+00
1 1 84.3 84.3 95.8 95.8 8.501307e 8.501307e
2 2 84.3 84.3 192.7 192.7 2.262742e-02 2.262742e-02
3 3 84.3 84.3 617.0 617.0 5.395847e-01 5.395847e-01
4 4 84.3 84.3 54.0 54.0 1.484681 1.484681
5 5 18.3 18.3 95.8 95.8 9.612692e-01 9.612692e-01
6 6 18.3 18.3 192.7 192.7 9.600000e-01 9.600000e-01
7 7 18.3 18.3 617.0 617.0 1.706984e 1.706984e
8 8 18.3 18.3 544.0 544.0 1.128933e+00 1.128933e+00
9 9 95.8 95.8 52.7 52.7 6.157143e-01 6.157143e-01
10 10 95.8 95.8 617.0 617.0 8.880000e+00 8.880000e+00
11 11 95.8 95.8 54.0 54.0 4.533847e-01 4.533847e-01
12 12 192.7 192.7 617.0 617.0 5.048742e 5.048742e
13 13 192.7 192.7 544.0 544.0 1.838478e-02 1.838478e-02
14 14 617.0 617.0 544.0 544.0 7.360492e 7.360492e

eg In the table above, I want to take an average of C values from rows 0,5,6,7,8 because all of these rows have 18.3 in some of the columns.例如,在上表中,我想取第 0、5、6、7、8 行的 C 值的平均值,因为所有这些行在某些列中都有 18.3。 Then I want to store this average in another data frame in a row corresponding to '18.3'.然后我想将此平均值存储在与“18.3”相对应的行中的另一个数据框中。 Then, I want to take an average of C values from rows 1,5,9,10,11 because all these rows have 95.8 in some of the columns.然后,我想从第 1、5、9、10、11 行中取 C 值的平均值,因为所有这些行在某些列中都有 95.8。 Then I want to store this average in another data frame in a row corresponding to '95.8'.然后我想将此平均值存储在与“95.8”相对应的行中的另一个数据框中。 Similarly, I want to repeat this operation for each unique value that appears in columns A and B. I am unable to figure out a way to do this.同样,我想对 A 列和 B 列中出现的每个唯一值重复此操作。我无法找到一种方法来执行此操作。 Any hits will be helpful!任何点击都会有帮助!

I believe I understand what your asking, you want to store the mean of column C for each unique value in A and B as a row in a new df.我相信我理解您的要求,您想将 A 和 B 中每个唯一值的 C 列的平均值存储为新 df 中的一行。

The below code reads in the dataframe, which I created as data.csv, then finds the Unique values between the A and B columns calculates the mean of C where the rows of either A or B match the unique value.下面的代码读取我创建为 data.csv 的数据帧,然后找到 A 和 B 列之间的唯一值,计算 C 的平均值,其中 A 或 B 的行与唯一值匹配。

We then create a new data frame with the mean and the unique value.然后我们创建一个具有均值和唯一值的新数据框。

    import pandas as pd


    df = pd.read_csv("data.csv")


    unique_a = df.A.unique().tolist()
    unique_b = df.B.unique().tolist()
    b_uniques = [ i for  i in unique_b if i not in unique_a]

    unique_a += b_uniques

    output = []
    value = []
    for i in unique_a:
        output.append( df[(df['A']==i) | (df['B']==i)]['C'].mean())
        value.append(i)




    out_df = pd.DataFrame({"mean":output, "Group Value": value})


    output:
      mean  Group Value
0  2.336000         84.3
1  1.180000         18.3
2  3.882000         95.8
3  1.512500        192.7
4  4.708000        617.0
5  0.965000         54.0
6  2.836667        544.0
7  0.620000         52.7

Try this:尝试这个:

import pandas as pd

s = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a = [84.3, 84.3, 84.3, 84.3, 84.3, 18.3, 18.3, 18.3, 18.3, 95.8, 95.8, 95.8, 192.7, 192.7, 617.0]
b = [18.3, 95.8, 192.7, 617.0, 54.0, 95.8, 192.7, 617.0, 544.0, 52.7, 617.0, 54.0, 617.0, 544.0, 544.0]
c = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150]

df = pd.DataFrame(s, columns=['Sr. No.'])
df['A'] = a
df['B'] = b
df['C'] = c

completeSet = set(list(df['B']) + list(df['A']))
list_df_num = []
list_df_avg = []

for num in completeSet:
    list_df_num.append(num)
    tmp = df[(df['A'] == num) | (df['B'] == num)]
    if len(tmp) > 0:
        avg = sum(list(tmp['C'])) / len(list(tmp['C']))
        list_df_avg.append(avg)
    else:
        list_df_avg.append(0)

result = pd.DataFrame(list_df_num, columns=['Number'])
result['Average'] = list_df_avg

print(result)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在熊猫中根据一定数量的列对行进行分类(删除列多于设置的行) - how to classify the rows based on the certain amount of columns (delete the rows which have more column than the set one) in panda 某些列具有最大值之一的 select 行如何 - How select rows where certain columns have one of the largest values 如何在 Panda 中查找哪些行的值超过某些列(例如 20 列)? - how to find which rows have more than certain columns with values (e.g. 20 columns) in Panda? Python Dataframe 根据其中一列中的最大值选择行 - Python Dataframe select rows based on max values in one of the columns Python检查一个或多个值是否为None并知道哪些值是 - Python check if one or more values is None and knows which ones are 如何在 python 的数据集中的任何列中找出一个或多个值为零的行 - how to find out rows that have one or more values as zero in any column of data set in python 如何选择在python中其列值之一包含特定字符串的行? - how to select rows which one of its columns values contains specific string in python? 如何在不知道哪些行的情况下在多列中使用 NaN 行 select 行? - How to select rows with NaN in multiple columns without knowing which ones? 如果列中的任何行包含特定字符串,请选择列 - Select columns if any of their rows contain a certain string 更新其他行上具有相似列的行的某些列值 - Updating certain column values of rows with similar columns on the other rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM