简体   繁体   English

在Python上根据一定条件删除多列CSV文件

[英]Deleting multiple columns of CSV files based on certain conditions on Python

I have a CSV file containing multiple columns(almost 100).我有一个包含多列(几乎 100)的 CSV 文件。 How can I filter multiple columns at once using certain criteria in Python?如何使用 Python 中的某些条件一次过滤多个列? To be more precise, many of the columns are of no use to me.更准确地说,许多列对我来说毫无用处。 How can the file be filtered?如何过滤文件?

PS: I am a beginner user. PS:我是新手用户。

Let's say you have following content in csv file假设您在 csv 文件中有以下内容

Col1,Col2, Col3 Col1、Col2、Col3
1,a,0 1,a,0
2,b,0 2,b,0
3,d,1 3,d,1

Read it in pandas dataframe using following script使用以下脚本在 pandas dataframe 中阅读它

import pandas as pd  

df=pd.read_csv(file)

To see the columns in dataframe df use要查看 dataframe df 中的列,请使用

print(df.columns)

This will give you the column names in df in form of list, in this case ['col1', 'col2', 'col3']这将以列表的形式为您提供 df 中的列名,在本例中为 ['col1', 'col2', 'col3']

To retain only specific columns (for example col1 and col3) you can use要仅保留特定列(例如 col1 和 col3),您可以使用

df=df [ [ "col1","col3"] ]

Now if you print (df.columns) it will only have ['col1', 'col3']现在如果你打印 (df.columns) 它只会有 ['col1', 'col3']

Edited in reply to the comment:编辑回复评论:

If you want to delete the columns that fulfil certain condition you can use the following script如果要删除满足特定条件的列,可以使用以下脚本

for column in df.columns:

    if 0 in df[column].values: # This will check if 0 is in values of column,  you can add any condition you want here

    print('Deleting column', column) # I assume you want to delete the column that fulfills the condition

    df=df.drop(columns=column) # This statement will delete the column fulfilling the condition
print("df after deleting columns:")
print(df)

It will print它会打印

Deleting column col3删除列 col3

df after deleting columns:删除列后的df:

col1, col2 col1, col2

1,a 1,一个

2,b 2,b

3,c 3、c

If you want to delete all the zero values from your dataframe column, you should follow following steps, (suppose you dataframe has name df )如果要从 dataframe 列中删除所有零值,则应按照以下步骤操作(假设您 dataframe 的名称为df

  1. Replace all the zero values to nan first首先将所有零值替换为nan
import numpy as np
import pandas as pd

df = df.replace(0, np.nan)

  1. Drop the nan value using dropna method in pandaspandas中使用dropna方法删除nan
df = df.dropna(axis=1, how='all')

The parameter axis=1 is for assigning drop rule for columnwise .参数axis=1用于为columnwise分配删除规则。 And the how=all for checking all the values inside this column.以及用于检查此列中所有值的how=all

In this way, Single line answer is below这样,单行答案如下

df = df.replace(0, np.nan).dropna(axis=1, how=all)

You can parse the csv file as pandas dataframe and then play around.您可以将 csv 文件解析为 pandas dataframe 然后玩转。 Please have a look at the pandas documentation on how to read csv files.请查看 pandas 文档,了解如何阅读 csv 文件。 You can extract the column you want based on their header names.您可以根据其 header 名称提取所需的列。 Also you can apply mathematical operations in a fast way.您还可以快速应用数学运算。 Though, for large scale calculations, note that python is not suitable since each time you import your libraries.但是,对于大规模计算,请注意 python 不适合,因为每次导入库时。

For example, if you have a dataframe df with columns col1 , col2 , col3 and col4 and you want only col1 and col2 , you could do -例如,如果您有一个 dataframe dfcol1col2col3col4并且您只想要col1col2 ,您可以这样做 -

new_df = df[['col1', 'col2']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM