I have a script that reads in a large data file 3GB I don't need all of the data and would like to skip particular rows of data if a condition is meet. Is there a Python function to skip a row of data in a data file and continue reading the file? I checked the 3.2 docs but only found a function that skips chunks of data.
EDIT
reading in data like this
def read_file(F): #Function that reads data froma file
#and extracts specific data columns
X = []
Y = [] # Creats Data Lists
Z = []
N = 11912639 # number of lines to be read
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines: #Loop Strips empty lines as well as replaces tabs with space
if line !='':
line = line.strip()
line = line.replace('\t',' ')
columns = line.split()
x = columns[0] # assigns variable to columns
y = columns[1]
z = columns[2]
X.append(x)
Y.append(y) #appends data in list
Z.append(z)
What I was thinking of doing is putting an if statement in the above code something like
if x > somevalue:
skipline
else:
continue
If rows in your files correspond to lines, then just use a list comprehension:
with open(path) as input_file:
contents = [row for row in input_file if not unwanted(row)]
Similar constructs may be possible if your reading the file from some lazy reader other than the default line-by-line textfile reader.
Replace []
with ()
if you want to read the file lazily.
If I understand your sample code right, what you're looking for is something like this:
for line in nlines:
line = line.strip()
if line == '':
continue # skip empty lines
line = line.replace('\t',' ')
x, y, z = line.split()
if x > somevalue:
continue # skip line if x > somevalue
X.append(x)
Y.append(y)
Z.append(z)
You could do it manually yourself.
for line in file:
if 'foo' not in line:
print 'profiting'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.