[英]Python - Compare rows of .csv file based on column conditions
个人身份 | 年龄 | 高度 |
---|---|---|
1 | 18 | 191 |
2 | 35 | 187 |
3 | 52 | 165 |
4 | 20 | 172 |
5 | 49 | 188 |
6 | 62 | 174 |
我有这个.csv 文件。 我的任务是将每一行与所有其他行进行比较,根据年龄和身高差值对其进行过滤并给出相应的结果。
我将要使用的过滤器如下:
过滤器(1):检查所有人之间的年龄差是否小于 5。
过滤器(2):如果过滤器(1)成立,则检查高度差<20(仅适用于过滤器(1)持有的人,一般不)
目标:打印
ID_person 1, ID_Person_2,
Age_Diff,Height_Diff
我试图使用 itertools.combinations 来访问这些列。
有没有更好、更简单或更 Pythonic 的方式来做这样的事情? 这是我到目前为止编码的内容
import pandas as pd
import os
import itertools
from operator import itemgetter
import math
# assign and open dataset
file = "C:/Users/User/Desktop/ages.csv"
data = pd.read_csv(file, index_col="personId")
pwd = os.getcwd()
os.chdir(os.path.dirname(file))
trainData = pd.read_csv(os.path.basename(file))
os.chdir(pwd)
# displaying data
print("\nMy Data is:")
print(data)
# Since we do a pairwise comparison Age of 1st Person we check = a1, Age of 2nd Person we check = a2
for a1,a2 in itertools.combinations(data['age'],2):
if abs(a1-a2) < 5:
new_data = (a1,a2, abs(a1-a2))
#print(new_data)
df1 = data.loc[(data['age'] == new_data[0])]
new_df1 = df1.append(data.loc[(data['age'] == new_data[1])]) # Create a new dataframe of filtered ages
print("my first dataframe filtered by age difference is \n",new_df1)
# Access the new dataframe to check height differences
# Since we do a pairwise comparison Height of 1st Person we check = h1, Height of 2nd Person we check = h2
for h1,h2 in itertools.combinations(new_df1['height'],2):
if abs(h1-h2) < 20:
new_data_h = (h1,h2,abs(h1-h2))
print(new_data_h)
df2 = data.loc[(data['height'] == new_data_h[0])]
new_df2 = df2.append(data.loc[(data['height'] == new_data_h[1])]) # Create a new dataframe of filtered heights
print("my second dataframe filtered by height difference is \n",new_df2)
您可以为选定的年龄创建一个新的 dataframe 并应用 filter2
import pandas as pd
import io
import itertools
c = """
PersonId Age Height
1 18 191
2 35 187
3 52 165
4 20 172
5 49 188
6 62 174
"""
data = pd.read_csv(io.StringIO(c), sep=' ', engine='python')
selected_data = pd.DataFrame(columns=["PersonId","Age","Height"])
for a1,a2 in itertools.combinations(data['Age'],2):
if abs(a1-a2) < 5:
selected_data = selected_data.append(data.loc[data['Age'] == a1])
selected_data = selected_data.append(data.loc[data['Age'] == a2])
for h1,h2 in itertools.combinations(selected_data['Height'],2):
if abs(h1-h2) < 20:
new_data_h = (h1,h2,abs(h1-h2))
print(new_data_h)
output:
(191, 172, 19)
(191, 188, 3)
(172, 165, 7)
(172, 188, 16)
在 filter2 之后我什么也没做,但是您可以创建一个新的 dataframe 或修改选定的一个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.