I'm using a chunk function to pre-process my data for ML because my data fairly large.
After data processing I'm trying to add the processed data back into the original data frame as a new column 'chunk' this gives me a memory error so I'm trying to load chunks at a time into the dataframe but I still get a memory error:
MemoryError: Unable to allocate array with shape (414, 100, 32765) and data type float64
Here's my data:
Antibiotic ... Genome
0 isoniazid ... ccctgacacatcacggcgcctgaccgacgagcagaagatccagctc...
1 isoniazid ... gggggtgctggcggggccggcgccgataaccccaccggcatcggcg...
2 isoniazid ... aatcacaccccgcgcgattgctagcatcctcggacacactgcacgc...
3 isoniazid ... gttgttgttgccgagattcgcaatgcccaggttgttgttgccgaga...
4 isoniazid ... ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcgg...
Here's my current code:
lookup = {
'a': 0.25,
'g': 0.50,
'c': 0.75,
't': 1.00,
'A': 0.25,
'G': 0.50,
'C': 0.75,
'T': 1.00
# z: 0.00
}
dfpath = 'C:\\Users\\CAAVR\\Desktop\\Ison.csv'
dataframe = pd.read_csv(dfpath, chunksize=100)
chunk_list = []
def preprocess(chunk):
processed_chunk = chunk['Genome'].apply(lambda bps: pd.Series([lookup[bp] if bp in lookup else 0.0 for bp in bps.lower()])).values
return processed_chunk;
for chunk in dataframe:
chunk_filter = preprocess(chunk)
chunk_list.append(chunk_filter)
chunk_array = np.asarray(chunk_list)
for chunk in chunk_array:
dataframe1 = dataframe.copy()
dataframe1["Chunk"] = chunk_array
dataframe1.to_csv(r'C:\\Users\\CAAVR\\Desktop\\chunk.csv')
If you need anymore info let me know. Thanks
Instead of combining all the chunks in memory, which just takes you back to the problem of running out of memory, I would suggest instead writing each chunk out separately.
If you open a file in append mode ( f = open('out.csv', 'a')
), you can do dataframe.to_csv(f)
multiple times. The first time it'll write columns, later calls do dataframe.to_csv(f, header=False)
since you've already written the column headers earlier.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.