简体   繁体   English

在Python中从stdin读取CSV文件并对其进行修改

[英]Reading CSV file from stdin in Python and modifying it

I need to read csv file from stdin and output the rows only the rows which values are equal to those specified in the columns. 我需要从stdin读取csv文件,并仅将值等于列中指定的行的行输出。 My input is like this: 我的输入是这样的:

 2
 Kashiwa
 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kenta Naai,Shirogane,Laboratory of Functional Analysis in Silico
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics
 Yukihide Tomari,Yayoi,Laboratory of RNA Function

My output should be like this: 我的输出应该是这样的:

 Name,Campus,LabName
 Shinichi MORISHITA,Kashiwa,Laboratory of Omics
 Kiyoshi ASAI,Kashiwa,Laboratory of Genome Informatics

I need to sort out the people whose values in column#2 == Kashiwa and not output first 2 lines of stdin in stdout. 我需要整理那些在column#2 == Kashiwa中值的人,而不要在stdout中输出stdin的前两行。

So far I just tried to read from stdin into csv but I am getting each row as a list of strings (as expected from csv documentation). 到目前为止,我只是试图从stdin读入csv,但是我将每一行作为字符串列表获取(如csv文档所期望的那样)。 Can I change this? 我可以改变这个吗?

 #!usr/bin/env python3

 import sys
 import csv

 data = sys.stdin.readlines()

 for line in csv.reader(data):

      print(line)

Output: 输出:

 ['2']
 ['Kashiwa']
 ['Name', 'Campus', 'LabName']
 ['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
 ['Kenta Naai', 'Shirogane', 'Laboratory of Functional Analysis in 
 Silico']
 ['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']
 ['Yukihide Tomari', 'Yayoi', 'Laboratory of RNA Function']

Can someone give me some advice on reading stdin into CSV and manipulating the data later (outputting only needed values of columns, swapping the columns, etc.,)? 有人可以给我一些建议,以便将stdin读入CSV并稍后处理数据(仅输出所需的列值,交换列等)吗?

This is one approach. 这是一种方法。

Ex: 例如:

import csv

with open(filename) as csv_file:
    reader = csv.reader(csv_file)
    next(reader) #Skip First Line
    next(reader) #Skip Second Line
    print(next(reader)) #print Header
    for row in reader:
        if row[1] == 'Kashiwa':   #Filter By 'Kashiwa'
            print(row)

Output: 输出:

['Name', 'Campus', 'LabName']
['Shinichi MORISHITA', 'Kashiwa', 'Laboratory of Omics']
['Kiyoshi ASAI', 'Kashiwa', 'Laboratory of Genome Informatics']

Use Pandas to read your and manage your data in a DataFrame 使用Pandas在DataFrame中读取和管理数据

import pandas as pd
# File location
infile = r'path/file'
# Load file and skip first two rows
df = pd.read_csv(infile, skiprows=2)
# Refresh your Dataframe en throw out the rows that contain Kashiwa in the campus column
df = df[df['campus'] != 'Kashiwa']

You can perform all kinds edits for example sort your DataFrame simply by: 您可以执行各种编辑,例如通过以下方式对DataFrame进行排序:

df.sort(columns='your column')

Check the Pandas documentation for all the possibilities. 有关所有可能性,请查阅Pandas文档

 #!usr/bin/env python3
 import sys
 import csv

 data = sys.stdin.readlines()  # to read the file
 column_to_be_matched = int(data.pop(0)) # to get the column number to match
 word_to_be_matched = data.pop(0) # to get the word to be matched in said column
 col_headers = data.pop(0) # to get the column names
 print(", ".join(col_headers)) # to print the column names
 for line in csv.reader(data):
     if line[column_to_be_matched-1] == word_to_be_matched: #while it matched
        print(", ".join(line)) #print it

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM