简体   繁体   English

尝试使用 python 中的 pandas 过滤具有多个变量的 CSV 文件

[英]Trying to filter a CSV file with multiple variables using pandas in python

import pandas as pd
import numpy as np
df = pd.read_csv("adult.data.csv")

print("data shape: "+str(data.shape))
print("number of rows: "+str(data.shape[0]))
print("number of cols: "+str(data.shape[1]))
print(data.columns.values)

datahist = {}
for index, row in data.iterrows():
    k = str(row['age']) + str(row['sex']) + 
str(row['workclass']) + str(row['education']) + 
str(row['marital-status']) + str(row['race'])
    if k in datahist:
        datahist[k] += 1
    else:
        datahist[k] = 1
uniquerows = 0
for key, value in datahist.items():
    if value == 1:
        uniquerows += 1
print(uniquerows)

for key, value in datahist.items():
    if value == 1: 
        print(key)

df.loc[data['age'] == 58] & df.loc[data['sex'] == Male]

I have been trying to get the above code to work.我一直试图让上面的代码工作。

I have limited experience in coding but it seems like the issue lies with some of the columns being objects.我在编码方面的经验有限,但问题似乎在于某些列是对象。 The int64 columns work just fine when it comes to filtering. int64 列在过滤时工作得很好。

Any assistance will be much appreciated!任何帮助将不胜感激!

df.loc[data['age'] == 58] & df.loc[data['sex'] == Male]

Firstly you are attemping to use Male variable, you probably meant string, ie it should be 'Male' , secondly observe [ and ] placement, you are extracting part of DataFrame with age equal 58 then extracting part of DataFrame with sex equal Male and then try to use bitwise and.首先,您尝试使用Male变量,您的意思可能是字符串,即它应该是'Male' ,其次观察[]放置,您正在提取 DataFrame 的一部分, age等于 58,然后提取 DataFrame 的一部分, sex等于Male然后尝试使用按位与。 You should probably use & with conditions rather than pieces of DataFrame that is您可能应该在条件下使用&而不是 DataFrame 的片段

df.loc[(data['age'] == 58) & (data['sex'] == 'Male')]

The int64 columns work just fine because you've specified the condition correctly as: int64 列工作得很好,因为您已将条件正确指定为:

data['age'] == 58

However, the object column condition data['sex'] == Male should be specified as a string:但是,object 列条件data['sex'] == Male应指定为字符串:

data['sex'] == 'Male'

Also, I noticed that you have loaded the dataframe df = pd.read_csv("adult.data.csv") .另外,我注意到您已经加载了 dataframe df = pd.read_csv("adult.data.csv") Do you mean this instead?你的意思是这个吗?

data = pd.read_csv("adult.data.csv")

The query at the end includes 2 conditions, and should be enclosed in brackets within the square brackets [ ] filter.末尾的查询包括 2 个条件,并且应该用方括号[ ]过滤器内的括号括起来。 If the dataframe name is data (instead of df ), it should be:如果 dataframe 名称是data (而不是df ),它应该是:

data.loc[ (data['age'] == 58]) & (data['sex'] == Male) ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM