繁体   English   中英

Python - 根据列条件比较.csv文件的行

[英]Python - Compare rows of .csv file based on column conditions

个人身份 年龄 高度
1 18 191
2 35 187
3 52 165
4 20 172
5 49 188
6 62 174

我有这个.csv 文件。 我的任务是将每一行与所有其他行进行比较,根据年龄和身高差值对其进行过滤并给出相应的结果。

我将要使用的过滤器如下:
过滤器(1):检查所有人之间的年龄差是否小于 5。
过滤器(2):如果过滤器(1)成立,则检查高度差<20(仅适用于过滤器(1)持有的人,一般不)

目标:打印

ID_person 1, ID_Person_2,
Age_Diff,Height_Diff

我试图使用 itertools.combinations 来访问这些列。

有没有更好、更简单或更 Pythonic 的方式来做这样的事情? 这是我到目前为止编码的内容


        import pandas as pd
        import os
        import itertools
        from operator import itemgetter
        import math
        
        # assign and open dataset
        file = "C:/Users/User/Desktop/ages.csv"
        data = pd.read_csv(file, index_col="personId")
        
        pwd = os.getcwd()
        os.chdir(os.path.dirname(file))
        trainData = pd.read_csv(os.path.basename(file))
        os.chdir(pwd)
        
        # displaying data 
        print("\nMy Data is:")
        print(data)
        
        
    # Since we do a pairwise comparison Age of 1st Person we check = a1, Age of 2nd Person we check = a2
    for a1,a2 in itertools.combinations(data['age'],2):
            if abs(a1-a2) < 5:
                new_data = (a1,a2, abs(a1-a2))
                #print(new_data)
                df1 = data.loc[(data['age'] == new_data[0])]
                new_df1 = df1.append(data.loc[(data['age'] == new_data[1])]) # Create a new dataframe of filtered ages
                print("my first dataframe filtered by age difference is \n",new_df1)
                # Access the new dataframe to check height differences
                # Since we do a pairwise comparison Height of 1st Person we check = h1, Height of 2nd Person we check = h2
                for h1,h2 in itertools.combinations(new_df1['height'],2): 
                    if abs(h1-h2) < 20:
                        new_data_h = (h1,h2,abs(h1-h2))
                        print(new_data_h)
                        df2 = data.loc[(data['height'] == new_data_h[0])]
                        new_df2 = df2.append(data.loc[(data['height'] == new_data_h[1])]) # Create a new dataframe of filtered heights
                        print("my second dataframe filtered by height difference is \n",new_df2)

您可以为选定的年龄创建一个新的 dataframe 并应用 filter2

import pandas as pd
import io
import itertools

c = """
PersonId    Age Height
1   18  191
2   35  187
3   52  165
4   20  172
5   49  188
6   62  174
"""

data = pd.read_csv(io.StringIO(c), sep='    ', engine='python')
selected_data = pd.DataFrame(columns=["PersonId","Age","Height"])
for a1,a2 in itertools.combinations(data['Age'],2):
    if abs(a1-a2) < 5:
        selected_data = selected_data.append(data.loc[data['Age'] == a1])
        selected_data = selected_data.append(data.loc[data['Age'] == a2])
for h1,h2 in itertools.combinations(selected_data['Height'],2):
    if abs(h1-h2) < 20:
        new_data_h = (h1,h2,abs(h1-h2))
        print(new_data_h)

output:

(191, 172, 19)
(191, 188, 3)
(172, 165, 7)
(172, 188, 16)

在 filter2 之后我什么也没做,但是您可以创建一个新的 dataframe 或修改选定的一个。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM