How to filter and sort specific csv using python

Question

Please help me with the python script to filter the below CSV. Below is the example of the CSV dump for which I have done the initial filtration.

Last_name	Gender	Name	Phone	city
Ford	Male	Tom	123	NY
Rich	Male	Robert	21312	LA
Ford	Female	Jessica	123123	NY
Ford	Male	John	3412	NY
Rich	Other	Linda	12312	LA
Ford	Other	James	4321	NY
Smith	Male	David	123123	TX
Rich	Female	Mary	98689	LA
Rich	Female	Jennifer	86860	LA
Ford	Male	Richard	12123	NY
Smith	Other	Daniel	897097	TX
Ford	Other	Lisa	123123123	NY

import re

def gather_info (L_name):
    dump_filename = "~/Documents/name_report.csv"
    LN = []
    with open(dump_filename, "r") as FH:
        for var in FH.readlines():
            if L_name in var
                final = var.split(",")
                print(final[1], final[2], final[3])
    return LN

if __name__ == "__main__":
    L_name = input("Enter the Last name: ")

    la_name = gather_info(L_name)

By this, I am able to filter by the last name. for example, if I choose L_name as Ford, then I have my output as

Gender	Name	Phone
Male	Tom	123
Female	Jessica	123123
Male	John	3412
Other	James	4321
Male	Richard	12123
Other	Lisa	22412

I need help extending the script by selecting each gender and the values in the list to perform other functions, then calling the following gender and the values to achieve the same functions. for example, first, it selects the gender Male [Tom, John] and performs other functions. then selects the next gender Female [Jessica] and performs the same functions and then selects the gender Other [James, Lisa] and performs the same functions.

Answer 1

I would recomend using the pandas module which allows for easy filtering and grouping of data

import pandas as pd

if __name__ == '__main__':
    data = pd.read_csv('name_reports.csv')
    L_name = input("Enter the last name: ")
    by_last_name = data[data['Last_name'] == L_name]
    groups = by_last_name.groupby(['Gender'])
    for group_name, group_data in groups:
        print(group_name)
        print(group_data)

Breaking this down into its pieces the first part is

data = pd.read_csv('name_reports.csv')

This reads the data from the csv and places it into a dataframe Second we have

by_last_name = data[data['Last_name'] == L_name]

This filters the dataframe to only have results with Last_name equal to L_name Next we group the data.

groups = by_last_name.groupby(['Gender'])

this groups the filtered data frames by gender then we iterate over this. It returns a tuple with the group name and the dataframe associated with that group.

for group_name, group_data in groups:
    print(group_name)
    print(group_data)

This loop just prints out the data to access fields from it you can use the iterrows function

for index,row in group_data.iterrows():
    print(row['city']
    print(row['Phone']
    print(row['Name']

And then you can use those for whatever function you want. I would recommend reading on the documentation for pandas since depending on the function you plan on using there may be a better way to do it using the library. Here is the link to the library https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

Answer 2

Since you cannot use the pandas module then a method using only the csv module would look like this

import csv
def has_last_name(row,last_name):
    return row['Last_name'] == last_name
def has_gender(row,current_gender):
    return row['Gender'] == current_gender
if __name__ == '__main__':
    data = None
    genders = ['Male','Female','Other']
    with open('name_reports.csv') as csvfile:
        data = list(csv.DictReader(csvfile,delimiter=','))
    L_name = input('Enter the Last name: ')
    get_by_last_name = lambda row: has_last_name(row,L_name)
    filtered_by_last_name = list(filter(get_by_last_name,data))
    for gender in genders:
        get_by_gender = lambda row: has_gender(row,gender)
        filtered_by_gender = list(filter(get_by_gender,filtered_by_last_name))
        print(filtered_by_gender)

The important part is the filter built in function. This takes in a function that takes in an item from a list and returns a bool. filter takes this function and an iterable and returns a generator of items that return true for that function. The other important part is the csv.DictReader which returns your csv file as a dictionary which makes allows you to access attributes by key instead of by index.

How to filter and sort specific csv using python

Question

2 answers

solution1
0 2022-05-16 18:25:34

solution2
0 2022-05-17 09:49:28

How to filter and sort specific csv using python

Question

2 answers

solution1 0 2022-05-16 18:25:34

solution2 0 2022-05-17 09:49:28

solution1
0 2022-05-16 18:25:34

solution2
0 2022-05-17 09:49:28