I need to read csv file from stdin and output the rows only the rows which values are equal to those specified in the columns. My input is like this:
2
Kashiwa
Name,Campus,LabName
Shinichi MORISHITA,Kashiwa,Laboratory of Omics
Kenta Naai,Shirogane,Laboratory of Functional Analysis in Silico
Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics
Yukihide Tomari,Yayoi,Laboratory of RNA Function
My output should be like this:
Name,Campus,LabName
Shinichi MORISHITA,Kashiwa,Laboratory of Omics
Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics
I need to sort out the people whose values in column#2 == Kashiwa and not output first 2 lines of stdin in stdout.
So far I just tried to read from stdin into csv but I am getting each row as a list of strings (as expected from csv documentation). Can I change this?
#!usr/bin/env python3
import sys
import csv
data = sys.stdin.readlines()
for line in csv.reader(data):
print(line)
Output:
['2']
['Kashiwa']
['Name', 'Campus', 'LabName']
['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
['Kenta Naai', 'Shirogane', 'Laboratory of Functional Analysis in
Silico']
['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']
['Yukihide Tomari', 'Yayoi', 'Laboratory of RNA Function']
Can someone give me some advice on reading stdin into CSV and manipulating the data later (outputting only needed values of columns, swapping the columns, etc.,)?
This is one approach.
Ex:
import csv
with open(filename) as csv_file:
reader = csv.reader(csv_file)
next(reader) #Skip First Line
next(reader) #Skip Second Line
print(next(reader)) #print Header
for row in reader:
if row[1] == 'Kashiwa': #Filter By 'Kashiwa'
print(row)
Output:
['Name', 'Campus', 'LabName']
['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']
Use Pandas to read your and manage your data in a DataFrame
import pandas as pd
# File location
infile = r'path/file'
# Load file and skip first two rows
df = pd.read_csv(infile, skiprows=2)
# Refresh your Dataframe en throw out the rows that contain Kashiwa in the campus column
df = df[df['campus'] != 'Kashiwa']
You can perform all kinds edits for example sort your DataFrame simply by:
df.sort(columns='your column')
Check the Pandas documentation for all the possibilities.
#!usr/bin/env python3
import sys
import csv
data = sys.stdin.readlines() # to read the file
column_to_be_matched = int(data.pop(0)) # to get the column number to match
word_to_be_matched = data.pop(0) # to get the word to be matched in said column
col_headers = data.pop(0) # to get the column names
print(", ".join(col_headers)) # to print the column names
for line in csv.reader(data):
if line[column_to_be_matched-1] == word_to_be_matched: #while it matched
print(", ".join(line)) #print it
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.