Python - 根据列条件比较.csv文件的行

Question

个人身份	年龄	高度
1	18	191
2	35	187
3	52	165
4	20	172
5	49	188
6	62	174

我有这个.csv 文件。 我的任务是将每一行与所有其他行进行比较，根据年龄和身高差值对其进行过滤并给出相应的结果。

我将要使用的过滤器如下：
过滤器（1）：检查所有人之间的年龄差是否小于 5。
过滤器（2）：如果过滤器（1）成立，则检查高度差<20（仅适用于过滤器（1）持有的人，一般不）

目标：打印

ID_person 1, ID_Person_2,
Age_Diff,Height_Diff

我试图使用 itertools.combinations 来访问这些列。

有没有更好、更简单或更 Pythonic 的方式来做这样的事情？ 这是我到目前为止编码的内容


        import pandas as pd
        import os
        import itertools
        from operator import itemgetter
        import math
        
        # assign and open dataset
        file = "C:/Users/User/Desktop/ages.csv"
        data = pd.read_csv(file, index_col="personId")
        
        pwd = os.getcwd()
        os.chdir(os.path.dirname(file))
        trainData = pd.read_csv(os.path.basename(file))
        os.chdir(pwd)
        
        # displaying data 
        print("\nMy Data is:")
        print(data)
        
        
    # Since we do a pairwise comparison Age of 1st Person we check = a1, Age of 2nd Person we check = a2
    for a1,a2 in itertools.combinations(data['age'],2):
            if abs(a1-a2) < 5:
                new_data = (a1,a2, abs(a1-a2))
                #print(new_data)
                df1 = data.loc[(data['age'] == new_data[0])]
                new_df1 = df1.append(data.loc[(data['age'] == new_data[1])]) # Create a new dataframe of filtered ages
                print("my first dataframe filtered by age difference is \n",new_df1)
                # Access the new dataframe to check height differences
                # Since we do a pairwise comparison Height of 1st Person we check = h1, Height of 2nd Person we check = h2
                for h1,h2 in itertools.combinations(new_df1['height'],2): 
                    if abs(h1-h2) < 20:
                        new_data_h = (h1,h2,abs(h1-h2))
                        print(new_data_h)
                        df2 = data.loc[(data['height'] == new_data_h[0])]
                        new_df2 = df2.append(data.loc[(data['height'] == new_data_h[1])]) # Create a new dataframe of filtered heights
                        print("my second dataframe filtered by height difference is \n",new_df2)

Answer 1

您可以为选定的年龄创建一个新的 dataframe 并应用 filter2

import pandas as pd
import io
import itertools

c = """
PersonId    Age Height
1   18  191
2   35  187
3   52  165
4   20  172
5   49  188
6   62  174
"""

data = pd.read_csv(io.StringIO(c), sep='    ', engine='python')
selected_data = pd.DataFrame(columns=["PersonId","Age","Height"])
for a1,a2 in itertools.combinations(data['Age'],2):
    if abs(a1-a2) < 5:
        selected_data = selected_data.append(data.loc[data['Age'] == a1])
        selected_data = selected_data.append(data.loc[data['Age'] == a2])
for h1,h2 in itertools.combinations(selected_data['Height'],2):
    if abs(h1-h2) < 20:
        new_data_h = (h1,h2,abs(h1-h2))
        print(new_data_h)

output：

(191, 172, 19)
(191, 188, 3)
(172, 165, 7)
(172, 188, 16)

在 filter2 之后我什么也没做，但是您可以创建一个新的 dataframe 或修改选定的一个。

Python - 根据列条件比较.csv文件的行

问题描述

1 个解决方案

解决方案1
0 2021-12-22 14:04:00

Python - 根据列条件比较.csv文件的行

问题描述

1 个解决方案

解决方案1 0 2021-12-22 14:04:00

解决方案1
0 2021-12-22 14:04:00