This is my code to generate files by home id. Then I will analyze each home seperately.
import pandas as pd
data = pd.read_csv("110homes.csv")
for i in (np.unique(data['dataid'])):
print i
d1 = pd.DataFrame(data[data['dataid']==i])
k = str(i)
d1.to_csv(k + ".csv")
However, I am getting this error. The machine has 200 GB RAM and it is showing memory error too:
data = pd.read_csv("110homes.csv")
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 260, in _read
return parser.read()
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 721, in read
ret = self._engine.read(nrows)
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 1170, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 769, in pandas.parser.TextReader.read (pandas/parser.c:7544)
File "pandas/parser.pyx", line 819, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8137)
File "pandas/parser.pyx", line 1833, in pandas.parser._concatenate_chunks (pandas/parser.c:22383)
MemoryError
Data in RAM can take a lot more space than on disk. Without seeing your 110homes.csv
file, it's impossible to know details, but imagine that it consists of 10 floating point numbers per line, like: 0.0,1.0,2.0,...
. In the CSV, each takes 3 bytes + 1 byte for the delimiter. In Python, each takes 8 bytes (on a 64 byte machine) for the float, plus 2 bytes per Unicode char (another 8 bytes), plus 8 bytes for string length, plus 8 bytes per pointer, plus bytes per row, etc.
Think about it like this: On a 64 bit machine, the minimum size for a pointer, a native int, or a native float, is 8 bytes. You need several of those per field, and several more per row. There's nothing unusual about taking 15x in RAM versus disk.
Do a simple test: Take the first 10% of the lines of your file, and monitor python via top
as it processes. See how much RAM it uses. Does it use at least 20 GB?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.