- Freeze support python error
- Below is the original code with the error code
- Here is the correct code after modification
- python error: RuntimeError
- cause of this error
- Solution
- 3 Multiprocessing Common Errors
- Common Multiprocessing Errors
- Error 1: RuntimeError Starting New Processes
- Error 2: print() Does Not Work In Child Processes
- Error 3: Adding Attributes to Classes that Extend Process
Freeze support python error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == ‘__main__’:
freeze_support()
.
The «freeze_support()» line can be omitted if the program
is not going to be frozen to produce an executable.
Below is the original code with the error code
import multiprocessing as mpimport timefrom urllib.request import urlopen,urljoinfrom bs4 import BeautifulSoupimport re base_url = "https://morvanzhou.github.io/" #crawl Crawl the Webdef crawl(url): response = czasen (url) time.sleep(0.1) return response.read().decode() #parse parsing web pagesdef parse(html): soup = BeautifulSoup(html,"html.parser") urls = soup.find_all("a",<"href":re.compile("^/.+?/$")>) title = soup.find("h1").get_text().strip() page_urls = set([urljoin(base_url,url["href"])for url in urls]) url = soup.find("meta",)["content"] return title,page_urls,url unseen = set([base_url])seen = set()restricted_crawl = True pool = mp.Pool(4)count, t1 = 1, time.time()while len(unseen) != 0: # still get some url to visit if restricted_crawl and len(seen) > 20: break print("Distributed Crawling. ") crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen] htmls = [j.get() for j in crawl_jobs] # request connection print("Distributed Parsing. ") parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls] results = [j.get() for j in parse_jobs] # parse html print("Analysing. ") seen.update(unseen) # seen the crawled unseen.clear() # nothing unseen for title, page_urls, url in results: print(count, title, url) count += 1 unseen.update(page_urls - seen) # get new url to crawlprint("Total time: %.1f s" % (time.time()-t1)) # 16 s .
Here is the correct code after modification
import multiprocessing as mpimport timefrom urllib.request import urlopen,urljoinfrom bs4 import BeautifulSoupimport re base_url = "https://morvanzhou.github.io/" #crawl Crawl the Webdef crawl(url): response = czasen (url) time.sleep(0.1) return response.read().decode() #parse parsing web pagesdef parse(html): soup = BeautifulSoup(html,"html.parser") urls = soup.find_all("a",<"href":re.compile("^/.+?/$")>) title = soup.find("h1").get_text().strip() page_urls = set([urljoin(base_url,url["href"])for url in urls]) url = soup.find("meta",)["content"] return title,page_urls,url def main(): unseen = set([base_url]) seen = set() restricted_crawl = True pool = mp.Pool(4) count, t1 = 1, time.time() while len(unseen) != 0: # still get some url to visit if restricted_crawl and len(seen) > 20: break print("Distributed Crawling. ") crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen] htmls = [j.get() for j in crawl_jobs] # request connection print("Distributed Parsing. ") parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls] results = [j.get() for j in parse_jobs] # parse html print("Analysing. ") seen.update(unseen) # seen the crawled unseen.clear() # nothing unseen for title, page_urls, url in results: print(count, title, url) count += 1 unseen.update(page_urls - seen) # get new url to crawl print("Total time: %.1f s" % (time.time()-t1)) # 16 s . if __name__ == "__main__": main()
To sum up, it is to integrate your running code into a function, and then add
if __name__ == "__main__": main()
This line of code solves the problem.
python error: RuntimeError
python error: RuntimeError:fails to pass a sanity check due to a bug in the windows runtime This type of error
cause of this error
1. What is the problem between the current python and numpy versions, for example, the python3.9 and numpy1.19.4 I use myself will cause this error.
2.Numpy1.19.4 has problems with many current python versions.
Solution
Just drop the numpy version under File->Settings->Project:pycharmProjects->Project Interpreter.
1. Open the interpreter, as shown below:
2. Double-click numpy to modify its version:
3. Check to modify the version, and import the required lower version:
After you’re done, just run it again.
3 Multiprocessing Common Errors
You may encounter one among a number of common errors when using the multiprocessing.Process class in Python.
These errors are typically easy to identify and often involve a quick fix.
In this tutorial, you will discover the common errors when creating child processes in Python and how to fix each in turn.
Common Multiprocessing Errors
The multiprocessing module and multiprocessing.Process class provide a flexible and powerful approach to concurrency using child processes.
When you are getting started with multiprocessing in Python, you may encounter one of many common errors.
These errors are typically made because of bugs introduced by copy-and-pasting code, or from a slight misunderstanding in how new child processes work.
We will take a closer look at some of the more common errors made when creating new child processes; they are:
- Error 1: RuntimeError Starting New Processes
- Error 2: print() Does Not Work In Child Processes
- Error 3: Adding Attributes to Classes that Extend Process
Do you have an error using the multiprocessing module?
Let me know in the comments so I can recommend a fix and add the case to this tutorial.
Run your loops using all CPUs, download my FREE book to learn how.
Error 1: RuntimeError Starting New Processes
It is common to get a RuntimeError when starting a new Process in Python.
The content of the error often looks as follows:
This will happen on Windows and MacOS where the default start method is ‘spawn‘. It may also happen when you configure your program to use the ‘spawn‘ start method on other platforms.
This is a common error and is easy to fix.
The fix involves checking if the code is running in the top-level environment and only then, attempt to start a new process.
The idiom for this fix, as stated in the message of the RuntimeError, is to use an if-statement and check if the name of the module is equal to the string ‘__main__‘.
This is called “protecting the entry point” of the program.
Recall, that __name__ is a variable that refers to the name of the module executing the current code.
Also, recall that ‘__main__‘ is the name of the top-level environment used to execute a Python program.
Using an if-statement to check if the module is the top-level environment and only starting child processes within that block will resolve the RuntimeError.
It means that if the Python file is imported, then the code protected by the if-statement will not run. It will only run when the Python file is run directly, e.g. is the top-level environment.
The if-statement idiom is required, even if the entry point of the program calls a function that itself starts a child process.
You can learn more about this common error in the tutorial:
Confused by the multiprocessing module API?
Download my FREE PDF cheat sheet
Error 2: print() Does Not Work In Child Processes
Printing to standard out (stdout) with the built-in print() function may not work property from child processes.
For example, you may print output messages for the user or debug messages from a child process and they may never appear, or may only appear when the child process is terminated.
This is a very common situation and the cause is well understood and easy to workaround.
The print() function is a built-in function for displaying messages on standard output or stdout.
When you call print() from a child process created using the ‘spawn‘ start method, the message will not appear.
This is because the messages are block buffered by default and the buffer is not flushed by default after every message. This is unlike the main process that is interactive and will flush messages after each line, e.g. line buffered.
Instead, the buffered messages are only flushed occasionally, such as when the child process terminates and the buffer is garbage collected.
We can flush stdout automatically with each call to print().
This can be achieved by setting the ‘flush‘ argument to True.
An alternate approach is to call the flush() function on the sys.stdout object directly.
The problem with the print() function only occurs when using the ‘spawn‘ start method.
You can change the start method to ‘fork‘ which will cause print() to work as expected.
Note, the ‘fork‘ start method is not supported on Windows at the time of writing.
You can set the start method via the multiprocessing.set_start_method() function.
You can learn more about process start methods in the tutorial:
You can learn more about fixing print() from child processes in the tutorial:
Free Python Multiprocessing Course
Download my multiprocessing API cheat sheet and as a bonus you will get FREE access to my 7-day email course.
Discover how to use the Python multiprocessing module including how to create and start child processes and how to use a mutex locks and semaphores.
Error 3: Adding Attributes to Classes that Extend Process
Python provides the ability to create and manage new processes via the multiprocessing.Process class.
We can extend this class and override the run() function in order to run code in a new child process.
You can learn more about extending the the multiprocessing.Process class in the tutorial:
Extending the multiprocessing.Process and adding attributes that are shared among multiple processes will fail with an error.
For example, if we define a new class that extends the multiprocessing.Process class that sets an attribute on the class instance from the run() method executed in a new child process, then this attribute will not be accessible by other processes, such as the parent process.
This is the case even if both parent and child processes share access to the “same” object.
This is because class instance variables are not shared among processes by default. Instead, instance variables added to the multiprocessing.Process are private to the process that added them.
Each process operates on a serialized copy of the object and any changes made to that object are local to that process only, by default.
If you set class attributes in the child process and try to access them in the parent process or another process, you will get an error.
This error occurred because the child process operates on a copy of the class instance that is different from the copy of the class instance used in the parent process.
Instance variable attributes can be shared between processes via the multiprocessing.Value and multiprocessing.Array classes.
These classes explicitly define data attributes designed to be shared between processes in a process-safe manner.
Shared variables mean that changes made in one process are always propagated and made available to other processes.
An instance of the multiprocessing.Value can be defined in the constructor of a custom class as a shared instance variable.
The constructor of the multiprocessing.Value class requires that we specify the data type and an initial value.
The data type can be specified using ctype “type” or a typecode.
Typecodes are familiar and easy to use, for example ‘i’ for a signed integer or ‘f’ for a single floating-point value.
For example, we can define a multiprocessing.Value shared memory variable that holds a signed integer and is initialized to the value zero.