[英]df.drop if it exists
Below is a function that takes a file and drops column names 'row_num", 'start_date', 'end_date.'下面是一个函数,它接受一个文件并删除列名“row_num”、“start_date”、“end_date”。
The problem is not every file has each of these column names, so the function returns an error.问题是不是每个文件都有这些列名,所以函数返回一个错误。
My goal is to alter code so that it removes these columns if it exists but does not return an error if the column does not exist.我的目标是更改代码,以便在这些列存在时删除这些列,但如果该列不存在则不返回错误。
def read_df(file):
df = pd.read_csv(file, na_values=['', ' '])
# Drop useless junk and fill empty values with zero
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)
df=df[df!=0][:-1].dropna().append(df.iloc[-1])
return df
Add parameter errors
to DataFrame.drop
:将参数
errors
添加到DataFrame.drop
:
errors : {'ignore', 'raise'}, default 'raise'
错误:{'ignore', 'raise'},默认为 'raise'
If 'ignore', suppress error and only existing labels are dropped.
如果为“忽略”,则抑制错误并仅删除现有标签。
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')
Sample :样品:
df = pd.DataFrame({'row_num':[1,2], 'w':[3,4]})
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')
print (df)
w
0 3
1 4
In my tests the following was at least as fast as any of the given answers:在我的测试中,以下内容至少与任何给定答案一样快:
candidates=['row_num','start_date','end_date','symbol']
df = df.drop([x for x in candidates if x in df.columns], axis=1)
It has the benefit of readability and (with a small tweak to the code) the ability to record exactly which columns existed/were dropped when.它具有可读性和(对代码进行小幅调整)能够准确记录哪些列存在/何时被删除的能力。
Some reasons this might be more desireable than the previous solutions:这可能比以前的解决方案更可取的一些原因:
Benchmark Results:基准测试结果:
Code for benchmark tests (credit to an answer in this question for how to create this sort of benchmark):基准测试代码(归功于此问题中有关如何创建此类基准的答案):
import math
from simple_benchmark import benchmark
import pandas as pd
# setting up the toy df:
def df_creator(length):
c1=list(range(0,10))
c2=list('a,b,c,d,e'.split(','))
c3=list(range(0,5))
c4=[True,False]
lists=[c1,c2,c3,c4]
df=pd.DataFrame()
count=0
for x in lists:
count+=1
df['col'+str(count)]=x*math.floor(length/len(x))
return df
# setting up benchmark test:
def list_comp(df,candidates=['col1','col2','col5','col8']):
return df.drop([x for x in candidates if x in df.columns], axis=1)
def looper(df,candidates=['col1','col2','col5','col8']):
for col in candidates:
if col in df.columns:
out = df.drop(columns=col, axis=1)
return out
def ignore_error(df,candidates=['col1','col2','col5','col8']):
return df.drop(candidates, axis=1, errors='ignore')
functions=[list_comp,looper,ignore_error]
args={n : df_creator(n) for n in [10,100,1000,10000,100000]}
argname='df_length'
b=benchmark(functions,args,argname)
b.plot()
I just had to do this;我只需要这样做; here's what I did:
这是我所做的:
# Drop these columns if they exist
cols = ['Billing Address Street 1', 'Billing Address Street 2','Billing Company']
for col in cols:
if col in df.columns:
df = df.drop(columns=col, axis=1)
Might not be the best way, but it served it's purpose.可能不是最好的方法,但它达到了它的目的。
x = ['row_num','start_date','end_date','symbol']
To check if column exists then You can do:要检查列是否存在,您可以执行以下操作:
for i in x:
if i in df:
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)
or或者
for i in x:
if i in df.columns:
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)
Oddly, No answers use the pandas
dataframe filter method
奇怪的是,没有答案使用
pandas
数据框filter method
thisFilter = df.filter(drop_list)
df.drop(thisFilter, inplace=True, axis=1)
This will create a filter from the drop_list
that exists in df
, then drop thisFilter
from the df
inplace
on axis=1
这将创建从一个滤波器
drop_list
存在于df
,再滴thisFilter
从df
inplace
上axis=1
ie, drop the columns that match the drop_list
and don't error if they are nonexistent即,删除与
drop_list
匹配的列,如果它们不存在则不要出错
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.