I have a script that removes "bad elements" from a master list of elements, then returns a csv with the updated elements and their associated values.
My question, is whether there is a more efficient way to perform the same operation in the for loop?
Master=pd.read_csv('some.csv', sep=',',header=0,error_bad_lines=False)
MasterList = Master['Elem'].tolist()
MasterListStrain1 = Master['Max_Principal_Strain'].tolist()
#this file should contain elements that are slated for deletion
BadElem=pd.read_csv('delete_me_elements_column.csv', sep=',',header=None, error_bad_lines=False)
BadElemList = BadElem[0].tolist()
NewMasterList = (list(set(MasterList) - set(BadElemList)))
filename = 'NewOutput.csv'
outfile = open(filename,'w')
#pdb.set_trace()
for i,j in enumerate(NewMasterList):
#pdb.set_trace()
Elem_Loc = MasterList.index(j)
line ='\n%s,%.25f'%(j,MasterListStrain1[Elem_Loc])
outfile.write(line)
print ("\n The new output file will be named: " + filename)
outfile.close()
Stage 1
If you necessarily want to iterate in the for loop then besides using pd.to_csv
which likely to improve performance you can do the following:
...
SetBadElem = set(BadElemList)
...
for i,Elem_Loc in enumerate(MasterList):
if Elem_Loc not in SetBadElem:
line ='\n%s,%.25f'%(j,MasterListStrain1[Elem_Loc])
outfile.write(line)
Jumping around the index is never efficient whereas iteration with skipping will give you much better performance (checking presence in a set is log n operation so it is relatively quick).
Stage 2 Using Pandas properly
...
SetBadElem = set(BadElemList)
...
for Elem in Master:
if Elem not in SetBadElem:
line ='\n%s,%.25f'%(Elem['elem'], Elem['Max_Principal_Strain'])
outfile.write(line)
There is no need to create lists out of pandas dataframe columns. Using the whole dataframe (and indexing into it) is a much better approach.
Stage 3 Removing messy iterated formatting operations
We can add a column ('Formatted') that will contain formatted data. For that we will create a lambda function :
formatter = lambda row: '\n%s,%.25f'%(row['elem'], row['Max_Principal_Strain'])
Master['Formatted'] = Master.apply(formatter)
Stage 4 Pandas-way filtering and output
We can format the dataframe in two ways. My preference is to reuse the formatting function:
import numpy as np
formatter = lambda row: '\n%s,%.25f'%(row['elem'], row['Max_Principal_Strain']) if row not in SetBadElem else np.nan
Now we can use the built-in dropna
which drops all rows that have any NaN values
Master.dropna()
Master.to_csv(filename)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.