简体   繁体   中英

Reading CSV file from stdin in Python and modifying it

I need to read csv file from stdin and output the rows only the rows which values are equal to those specified in the columns. My input is like this:

 2
 Kashiwa
 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kenta Naai,Shirogane,Laboratory of Functional Analysis in Silico
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics
 Yukihide Tomari,Yayoi,Laboratory of RNA Function

My output should be like this:

 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics

I need to sort out the people whose values in column#2 == Kashiwa and not output first 2 lines of stdin in stdout.

So far I just tried to read from stdin into csv but I am getting each row as a list of strings (as expected from csv documentation). Can I change this?

 #!usr/bin/env python3

 import sys
 import csv

 data = sys.stdin.readlines()

 for line in csv.reader(data):

      print(line)

Output:

 ['2']
 ['Kashiwa']
 ['Name', 'Campus', 'LabName']
 ['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
 ['Kenta Naai', 'Shirogane', 'Laboratory of Functional Analysis in 
 Silico']
 ['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']
 ['Yukihide Tomari', 'Yayoi', 'Laboratory of RNA Function']

Can someone give me some advice on reading stdin into CSV and manipulating the data later (outputting only needed values of columns, swapping the columns, etc.,)?

This is one approach.

Ex:

import csv

with open(filename) as csv_file:
    reader = csv.reader(csv_file)
    next(reader) #Skip First Line
    next(reader) #Skip Second Line
    print(next(reader)) #print Header
    for row in reader:
        if row[1] == 'Kashiwa':   #Filter By 'Kashiwa'
            print(row)

Output:

['Name', 'Campus', 'LabName']
['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']

Use Pandas to read your and manage your data in a DataFrame

import pandas as pd
# File location
infile = r'path/file'
# Load file and skip first two rows
df = pd.read_csv(infile, skiprows=2)
# Refresh your Dataframe en throw out the rows that contain Kashiwa in the campus column
df = df[df['campus'] != 'Kashiwa']

You can perform all kinds edits for example sort your DataFrame simply by:

df.sort(columns='your column')

Check the Pandas documentation for all the possibilities.

 #!usr/bin/env python3
 import sys
 import csv

 data = sys.stdin.readlines()  # to read the file
 column_to_be_matched = int(data.pop(0)) # to get the column number to match
 word_to_be_matched = data.pop(0) # to get the word to be matched in said column
 col_headers = data.pop(0) # to get the column names
 print(", ".join(col_headers)) # to print the column names
 for line in csv.reader(data):
     if line[column_to_be_matched-1] == word_to_be_matched: #while it matched
        print(", ".join(line)) #print it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM