[英]Is there any way to select certain rows in a table based on which ones have similar values in one or more columns in Python?
Sr. No.![]() |
A![]() |
B![]() |
C ![]() |
---|---|---|---|
0 ![]() |
84.3 ![]() |
18.3 ![]() |
1.138420e+00 ![]() |
1 ![]() |
84.3 ![]() |
95.8 ![]() |
8.501307e ![]() |
2 ![]() |
84.3 ![]() |
192.7 ![]() |
2.262742e-02 ![]() |
3 ![]() |
84.3 ![]() |
617.0 ![]() |
5.395847e-01 ![]() |
4 ![]() |
84.3 ![]() |
54.0 ![]() |
1.484681 ![]() |
5 ![]() |
18.3 ![]() |
95.8 ![]() |
9.612692e-01 ![]() |
6 ![]() |
18.3 ![]() |
192.7 ![]() |
9.600000e-01 ![]() |
7 ![]() |
18.3 ![]() |
617.0 ![]() |
1.706984e ![]() |
8 ![]() |
18.3 ![]() |
544.0 ![]() |
1.128933e+00 ![]() |
9 ![]() |
95.8 ![]() |
52.7 ![]() |
6.157143e-01 ![]() |
10 ![]() |
95.8 ![]() |
617.0 ![]() |
8.880000e+00 ![]() |
11 ![]() |
95.8 ![]() |
54.0 ![]() |
4.533847e-01 ![]() |
12 ![]() |
192.7 ![]() |
617.0 ![]() |
5.048742e ![]() |
13 ![]() |
192.7 ![]() |
544.0 ![]() |
1.838478e-02 ![]() |
14 ![]() |
617.0 ![]() |
544.0 ![]() |
7.360492e ![]() |
eg In the table above, I want to take an average of C values from rows 0,5,6,7,8 because all of these rows have 18.3 in some of the columns.例如,在上表中,我想取第 0、5、6、7、8 行的 C 值的平均值,因为所有这些行在某些列中都有 18.3。 Then I want to store this average in another data frame in a row corresponding to '18.3'.
然后我想将此平均值存储在与“18.3”相对应的行中的另一个数据框中。 Then, I want to take an average of C values from rows 1,5,9,10,11 because all these rows have 95.8 in some of the columns.
然后,我想从第 1、5、9、10、11 行中取 C 值的平均值,因为所有这些行在某些列中都有 95.8。 Then I want to store this average in another data frame in a row corresponding to '95.8'.
然后我想将此平均值存储在与“95.8”相对应的行中的另一个数据框中。 Similarly, I want to repeat this operation for each unique value that appears in columns A and B. I am unable to figure out a way to do this.
同样,我想对 A 列和 B 列中出现的每个唯一值重复此操作。我无法找到一种方法来执行此操作。 Any hits will be helpful!
任何点击都会有帮助!
I believe I understand what your asking, you want to store the mean of column C for each unique value in A and B as a row in a new df.我相信我理解您的要求,您想将 A 和 B 中每个唯一值的 C 列的平均值存储为新 df 中的一行。
The below code reads in the dataframe, which I created as data.csv, then finds the Unique values between the A and B columns calculates the mean of C where the rows of either A or B match the unique value.下面的代码读取我创建为 data.csv 的数据帧,然后找到 A 和 B 列之间的唯一值,计算 C 的平均值,其中 A 或 B 的行与唯一值匹配。
We then create a new data frame with the mean and the unique value.然后我们创建一个具有均值和唯一值的新数据框。
import pandas as pd
df = pd.read_csv("data.csv")
unique_a = df.A.unique().tolist()
unique_b = df.B.unique().tolist()
b_uniques = [ i for i in unique_b if i not in unique_a]
unique_a += b_uniques
output = []
value = []
for i in unique_a:
output.append( df[(df['A']==i) | (df['B']==i)]['C'].mean())
value.append(i)
out_df = pd.DataFrame({"mean":output, "Group Value": value})
output:
mean Group Value
0 2.336000 84.3
1 1.180000 18.3
2 3.882000 95.8
3 1.512500 192.7
4 4.708000 617.0
5 0.965000 54.0
6 2.836667 544.0
7 0.620000 52.7
Try this:尝试这个:
import pandas as pd
s = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a = [84.3, 84.3, 84.3, 84.3, 84.3, 18.3, 18.3, 18.3, 18.3, 95.8, 95.8, 95.8, 192.7, 192.7, 617.0]
b = [18.3, 95.8, 192.7, 617.0, 54.0, 95.8, 192.7, 617.0, 544.0, 52.7, 617.0, 54.0, 617.0, 544.0, 544.0]
c = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150]
df = pd.DataFrame(s, columns=['Sr. No.'])
df['A'] = a
df['B'] = b
df['C'] = c
completeSet = set(list(df['B']) + list(df['A']))
list_df_num = []
list_df_avg = []
for num in completeSet:
list_df_num.append(num)
tmp = df[(df['A'] == num) | (df['B'] == num)]
if len(tmp) > 0:
avg = sum(list(tmp['C'])) / len(list(tmp['C']))
list_df_avg.append(avg)
else:
list_df_avg.append(0)
result = pd.DataFrame(list_df_num, columns=['Number'])
result['Average'] = list_df_avg
print(result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.