简体   繁体   English

使用Pandas和Python过滤CSV文件程序

[英]Filter CSV File Program using Pandas and Python

I currently have a task that involves downloading a CSV master file, removing any lines where column A - Column B <= 0, and where Column C equals a given phrase. 我目前有一个任务,涉及下载CSV主文件,删除A列-B列<= 0,C列等于给定短语的行。 I'm looking to a create a program that will: 我正在寻找一个创建程序,该程序将:

  • Import a CSV File 导入CSV文件
  • Remove all lines where Column A - Column B <= 0 删除列A-列B <= 0的所有行
  • Ask for input to filter on Column C for one or more phrases 要求输入以在C列上过滤一个或多个短语
  • Export the CSV into a new file 将CSV导出到新文件

So far, I have determined that the best way to do this is to use Pandas' dataframe functionality, as I've used it previously to perform other operations on CSV files: 到目前为止,我已经确定最好的方法是使用Pandas的数据框功能,因为我以前曾使用它对CSV文件执行其他操作:

 import pandas as pd file = read_csv("sourcefile.csv") file['NewColumn'] = file['A'] - file['B'] file = file[file.NewColumn > 0] columns = ['ColumnsIWantToRemove'] file.drop(columns, inplace=True, axis=1) phrases = input('What phrases are you filtering for? ') file = file[file.C = phrases] file.to_csv('export.csv') 

My question is, how do I filter Column C for multiple phrases? 我的问题是,如何过滤C列中的多个短语? I want the program to take one or more phrases and only show rows where Column C's value equals one of those values. 我希望程序采用一个或多个短语,并且仅显示列C的值等于这些值之一的行。 Any guidance would be amazing. 任何指导将是惊人的。 Thank you!! 谢谢!!

I would just ask for input to be comma separated: 我只是要求输入要以逗号分隔:

phrases = phrases.split(",")
file = file[file.C.isin(phrases)]

maybe this can help you : 也许这可以帮助您:

import csv

input = open(sourcefile.csv, 'rb')
output = open(out_sourcefile, 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
    if (phrases you want C column not to be,and you can add here multiple phrases):
        continue
        writer.writerow(row)
input.close()
output.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM