Unable to allocate memory python

Python 3 MemoryError: Unable to allocate

На py 2 тот же скрипт работает: В системе стоит 16Гб используется 10ГБ. При запуске скрипта на py 3.8 16Гб расходуются полностью и выдает ошибку:

это как позвонить доктору и сказать — «доктор, я сегодня не могу на ногу наступить от боли, а вчера все было OK. Как полечить?» 😀

Попробуем лечить по фотографии. Python 2.x — какой архитектуры 32-bit или 64-bit и тот же вопрос для Python 3.x?

Ответ я написал, но он очень общий (хотя и, надеюсь, полезный). На вопрос о причинах мой ответ не отвечает, без знания конкретики вашего кода диагноз дать невозможно, дело может быть в чём угодно вообще. Могли и внутренности питона поменяться (а там много чего поменялось) и как-то повлиять, а может вы при переводе кода добавили какой-то незаметный баг, всё может быть.

2 ответа 2

Без кода советовать что-то сложно, но есть традиционно применяемые методы:

  • Понизить тип данных, например, вместо float64 использовать float32 или даже float16 (соответственно, понизив требования к памяти в 2 или в 4 раза), но это зависит от того, какая точность вычислений вам нужна. Для каких-то задач точность понижать нельзя, а для каких-то можно, а то и даже лучше результат с пониженной точностью получается (лучше генерализация).
  • Использовать разреженные матрицы scipy.sparse и те методы и библиотеки, которые умеют с ними работать. Если ваши данные по своей природе разреженные это может дать экономию памяти на порядок.
  • Если речь о машинном обучении, то зачастую бывает не обязательно обучаться сразу на всех данных, можно использовать случайные подмножества фич и случайное же сэмплирование наблюдений, а потом усреднить получающиеся результаты, сделав такие случайные выборки много раз. А бывает и так, что если данных у вас очень много и они сильно однородные, то и обучение всего на 1/10 случайно выбранных данных (через df.sample ) даёт почти такое же качество, как и обучение на полном наборе данных, при этом обучение происходит гораздо быстрее, и можно успеть ещё и подобрать оптимальные параметры обучения, повысив качество.
  • Некоторые библиотеки позволяют мапить массивы на файлы и не держать таким образом массивы в памяти.
  • Есть библиотеки, которые умеют сжимать массивы в памяти и при этом довольно прозрачно с ними работать как с обычными массивами.
  • Если вы работаете с Pandas , то есть такие библиотеки, как Dask и Vaex , которые имеют практически такой же интерфейс как Pandas , но при этом работают с файлами на диске, подтягивая данные в память по мере необходимости и оптимизируя запросы к данным, например, распараллеливая их.
Читайте также:  Extending file class java

Источник

MemoryError when I merge two Pandas data frames

I searched almost all over the internet and somehow none of the approaches seem to work in my case. I have two large csv files (each with a million+ rows and about 300-400MB in size). They are loading fine into data frames using the read_csv function without having to use the chunksize parameter. I even performed certain minor operations on this data like new column generation, filtering, etc. However, when I try to merge these two frames, I get a MemoryError. I have even tried to use SQLite to accomplish the merge, but in vain. The operation takes forever. Mine is a Windows 7 PC with 8GB RAM. The Python version is 2.7 Thank you. Edit: I tried chunking methods too. When I do this, I don’t get MemoryError, but the RAM usage explodes and my system crashes.

Yes. I’m using a 64-bit Python 2.7. Currently, Anaconda 4.3 (with Spyder 3) is installed on my system.

Suppose you have a data set with ten rows, 5 of them take the value ‘A’ , 5 are ‘B’ in the joining column. If you join this dataset with itself on this joining column, your result has 50 rows, or is 5 times as large. There’s a chance that there was some additional column that you should be joining on, but forgot to include.

4 Answers 4

When you are merging data using pandas.merge it will use df1 memory, df2 memory and merge_df memory. I believe that it is why you get a memory error. You should export df2 to a csv file and use chunksize option and merge data.

It might be a better way but you can try this. *for large data set you can use chunksize option in pandas.read_csv

df1 = pd.read_csv("yourdata.csv") df2 = pd.read_csv("yourdata2.csv") df2_key = df2.Colname2 # creating a empty bucket to save result df_result = pd.DataFrame(columns=(df1.columns.append(df2.columns)).unique()) df_result.to_csv("df3.csv",index_label=False) # save data which only appear in df1 # sorry I was doing left join here. no need to run below two line. # df_result = df1[df1.Colname1.isin(df2.Colname2)!=True] # df_result.to_csv("df3.csv",index_label=False, mode="a") # deleting df2 to save memory del(df2) def preprocess(x): df2=pd.merge(df1,x, left_on = "Colname1", right_on = "Colname2") df2.to_csv("df3.csv",mode="a",header=False,index=False) reader = pd.read_csv("yourdata2.csv", chunksize=1000) # chunksize depends with you colsize [preprocess(r) for r in reader] 

this will save merged data as df3.

Источник

Python MemoryError: cannot allocate array memory

Every method gave me the same result. MemoryError around 512 MB. Wondering if there was something special about 512MB, I created a simple test program which filled up memory until python crashed:

str = " " * 511000000 # Start at 511 MB while 1: str = str + " " * 1000 # Add 1 KB at a time 

Doing this didn’t crash until around 1 gig. I also, just for fun, tried: str = » » * 2048000000 (fill 2 gigs) — this ran without a hitch. Filled the RAM and never complained. So the issue isn’t the total amount of RAM I can allocate, but seems to be how many TIMES I can allocate memory.

I google’d around fruitlessly until I found this post: Python out of memory on large CSV file (numpy)

I copied the code from the answer exactly:

def iter_loadtxt(filename, delimiter=',', skiprows=0, dtype=float): def iter_func(): with open(filename, 'r') as infile: for _ in range(skiprows): next(infile) for line in infile: line = line.rstrip().split(delimiter) for item in line: yield dtype(item) iter_loadtxt.rowlength = len(line) data = np.fromiter(iter_func(), dtype=dtype) data = data.reshape((-1, iter_loadtxt.rowlength)) return data 

Calling iter_loadtxt(«data/training_nohead.csv») gave a slightly different error this time:

MemoryError: cannot allocate array memory 

As I’m running Python 2.7, this was not my issue. Any help would be appreciated.

Источник

[Solved] Oserror: [Errno 12] Cannot Allocate Memory

oserror: [errno 12] cannot allocate memory

We all know that while doing any operations on file CPU uses resources from our system. It may be memory or some I/O devices. Moreover, depending on the load of the operations, it sometimes shares heavy or graphical operations to GPUs. Now, it may be possible that the system can always fulfill the memory or I/O requirement every time. In that case, the CPU may stop its execution and wait for the fulfillment of the resources, or sometimes it may terminate the program. In this article, we will keep some light on one such issue named OsError: [errno 12] cannot allocate memory.

Oserror: [errno 12] cannot allocate memory is raised by the system when CPU won’t get enough memory resources to process pipelined operations and execute them. It may be possible that the system doesn’t have enough memory for storing intermediaries while the program is in the process. Or, it may run out of available memory due to its usage by other pipelined operations.

What is Oserror: [errno 12] cannot allocate memory?

What is Oserror: [errno 12] cannot allocate memory?

Oserror: [errno 12] cannot allocate memory is the memory unavailability error when we run our python program, and the system does not fulfill the memory requirement. This unavailability of resources leads to the termination of our python program with the raised error Oserror: [errno 12] cannot allocate memory.

Why do I get “Oserror: [errno 12] cannot allocate memory” Error?

There is no other reason than mentioned above for the error. Any operation we perform on our computer requires some memory resources and some input/output devices. These memory resources are used to store the intermediate files or variables created while executing our program. CPU fetches these files and variables from the main memory for further execution. Now, when the system cannot save those files in the main memory due to unavailable space, it raises the given error.

[Note: The availability of Secondary memory cannot fulfill the memory requirement by default as it is much slower than main memory. This increases the throughput of the program and can be durably expensive.]

Solution to Oserror: [errno 12] cannot allocate memory

The solution to the given error is that we need to increase the system’s memory space so that it can store the intermediate files there. Now, increasing RAM memory may not be practically possible for everyone. In that case, we need to check for some other alternative.

Now, it may not be necessary to store intermediate files. We necessarily need to extend our RAM as it is not a feasible option. What we can do we create a virtual memory in our system. This virtual memory is the part of hard disk memory allocated to work as the main memory( RAM). The use of this virtual memory is that the CPU can store temporary files between the operations. The virtual memory is also called swap memory. We need to assign 64 GB of swap memory to the system so that the CPU can use it whenever there is a requirement for that. It can be a feasible option and requires no extra cost for that. The only obstacle in this process is that it is slower whenever the CPU uses this swap memory because fetching data from the main memory is faster than a hard disk.

Python subprocess.Popen “OSError: [Errno 12] Cannot allocate memory”

However, you may also get the error while using subprocess.Popen. While using the subprocess.Popen() we somehow call the fork instance, which means that we are creating a child process or subprocess, and it is going to consume the same amount of memory that is already getting consumed by the python. Now, there can be hundreds of MB of memory space required for that. Subsequently, there may be a memory shortage, and we get the error. Now, to solve the issue, there may be two options: to increase the memory space, or the second option is to write efficient code controlling the script memory.

However, to do that, we have the alternative of using vfork and posix_spawn. But maybe we don’t want to rewrite the code for the subprocess.Popen in terms of posix_spawn/vfork. In that condition, one can use a subprocess.Popen but at the beginning of the script so that the memory that will be consumed is minimum due to fewer resources, and we got the shell script. Now, we can use the shell script for our parallel processing with the lighter tasks such as free/ps/sleep, whatever else is in a loop.

Conclusion

So, today in this article, we learned about Oserror: [errno 12] cannot allocate memory Error. We have seen what the possible reasons for the given error are. Then we have discussed the possible solutions for that. After following the above explanation, you will be able to solve the error.

I hope this article has helped you. Thank you.

Источник

Оцените статью