I want to add a new column tidy_tweet to the existing .csv file which implements the remove_pattern function
def remove_pattern(input_txt, pattern):
r = re.findall(pattern, input_txt)
for i in r:
input_txt = re.sub(i, '', input_txt)
return input_txt
I wrote these lines of code
data = pd.read_csv(filepath_or_buffer='stockerbot-export.csv', error_bad_lines=False)
data['tidy_tweet'] = np.vectorize(remove_pattern)(data['text'], "@[\w]*")
I am getting the following error
MemoryError Traceback (most recent call last)
<ipython-input-15-d6e7e950d5b9> in <module>()
----> 1 data['tidy_tweet'] = np.vectorize(remove_pattern)(data['text'], "@[\w]*")
~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
1970 vargs.extend([kwargs[_n] for _n in names])
1971
-> 1972 return self._vectorize_call(func=func, args=vargs)
1973
1974 def _get_ufunc_and_otypes(self, func, args):
~\Anaconda3\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
2049
2050 if ufunc.nout == 1:
-> 2051 res = array(outputs, copy=False, subok=True, dtype=otypes[0])
2052 else:
2053 res = tuple([array(x, copy=False, subok=True, dtype=t)
MemoryError:
I can't understand the error. Need help.
The error is self explanatory, you are running out of memory as you are working with a huge amount of data and doing loop over it. There is a simpler solution give it a try.
data['tidy_tweet'] = data['text'].str.replace('@[\w]*', '',regex=True)
remove regex=True
if you are using older version of pandas ie older than 0.23.0
Example:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.