- Python: File Buffering
- Default Buffer Size
- How To Open A File In Python
- Method Signature Open In Python
- How To Specify File Path When Opening A File In Python?
- How To Specify The Mode When Opening A File In Python?
- Buffering When Opening A File In Python
- How To Specify Encoding When Opening A File In Python?
- Newline Considerations When Opening A File In Python
- What Does closefd Do When Opening A File In Python?
Python: File Buffering
File Buffering in Python: In the builtin open() function, there is an optional argument, called buffering. This argument is used to specify the file’s desired buffer size i.e.
- 1: line buffered
- 0: unbuffered
- any other positive value: a buffer of that size in bytes
- negative value: use the system default which is usually line buffered for tty (teletypewriter) devices and fully buffered for other files. This is default value of the buffering argument.
We’ll look at buffers in detail after this snippet.
>>> fh1 = open('coding.py', 'r', 1) >>> fh1.line_buffering True >>> contents = fh1.buffer >>> for line in contents: print(line) # OUTPUT b'# -*- coding: utf-8 -*-\r\n' b"variableOne = 'Ethan'\r\n" b'print(variableOne)\r\n' >>> fh1.close() >>> fh1 = open('coding.py', 'r', 0) Traceback (most recent call last): File "", line 1, in fh1 = open('coding.py', 'r', 0) ValueError: can't have unbuffered text I/O >>> fh1.close() >>> fh1 = open('coding.py', 'r', 5) >>> fh1.line_buffering False >>> contents = fh1.buffer >>> for line in contents: print(line) b'# -*- coding: utf-8 -*-\r\n' b"variableOne = 'Ethan'\r\n" b'print(variableOne)\r\n' >>> fh1.close() >>> fh1 = open('coding.py', 'r', -1) >>> fh1.line_buffering False >>> contents = fh1.buffer >>> for line in contents: print(line) b'# -*- coding: utf-8 -*-\r\n' b"variableOne = 'Ethan'\r\n" b'print(variableOne)\r\n' >>> fh1.close()
A buffer stores a chunk of data from the Operating System’s file stream until it is consumed, at which point more data is brought into the buffer. The reason that is good practice to use buffers is that interacting with the raw stream might have high latency i.e. considerable time is taken to fetch data from it and also to write to it. Let’s take an example.
Let’s say you want to read 100 characters from a file every 2 minutes over a network. Instead of trying to read from the raw file stream every 2 minutes, it is better to load a portion of the file into a buffer in memory, and then consume it when the time is right. Then, next portion of the file will be loaded in the buffer and so on.
Note that the size of the buffer will depend on the rate at which the data is being consumed. For the example above, 100 characters are required after 2 minutes. So, anything less than 100 will result in increase in latency, 100 itself will do just fine, and anything more than a hundred will be swell.
Another reason for using a buffer is that it can be used to read large files (or files with uncertain size), one chunk at a time. While dealing with files, there might be occasions when you are not be sure of the size of the file that you are trying to read. Say, in an extremely unlikely scenario, if the file size was greater than the computer memory, it will cause a problem for the processing unit of your computer. Therefore, it is always regarded a safe option to pro-actively define maximum size that can be read. You can use several buffer instalments to read and manipulate the entire file, as demonstrated below:
# The following code snippet reads a file containing 196 bytes, with a buffer of 20 bytes, and writes to a file, 20 bytes at a time. # A practical example will have large-scale values of buffer and file size. buffersize = 20 # maximum number of bytes to be read in one instance inputFile = open('fileToBeReadFrom.txt', 'r') outputFile = open('fileToBeWrittenInto.txt', 'a') # opening a file in append mode; creates a file if it doesn't exist buffer = inputFile.read(buffersize) # buffer contains data till the specified cursor position # Writing the contents of a buffer another file 20 bytes at a time counter = 0 # a counter variable for us to see the instalments of 20 bytes while len(buffer): counter = counter + 1 outputFile.write(buffer) print( str(counter) + " ") buffer = inputFile.read(buffersize) # next set of 20 bytes from the input file outputFile.close() inputFile.close()
In actuality, there are two types of buffers:
The internal buffers are created by language or runtime library that you are using, for the purpose of speeding things up, by preventing system calls for every write operation. So, when you write to a file, you write into its buffer, and whenever the buffer is brimming, so to speak, the data is written to the actual file using system calls. That said, due to the operating system buffers, this does not necessarily mean that the data is written to the file itself. It may mean that the data has been copied from the internal buffers into the Operating System buffers.
So, when you perform a write operation, the data is still only in the buffer until the file is closed, and if your machine gets disconnected from power, the data is not in the file. To help you with this, there are 2 functions in Python: fileHandler.flush() and os.fsync(fileHandler) where os is an imported module for performing operating system tasks.
The flush() writes data from the internal buffer to the operating system buffer without having to close it. What this means is that if another process is performing a read operation from the same file, it will be able to read the data you just flushed to the file. However, this does not necessarily mean that the data has been written to the file, it could be or could not be. To ensure this, the os.fsync(fileHandler) function needs to be called which copies the data from operating system buffers to the file.
If you are uncertain whether what you are trying to write is actually being written when you think it is being written, you can use these function calls in the manner below.
>>> fh = open('fileToBeWrittenInto.txt', 'w+') >>> fh.write('Output line # 1') 15 >>> fh.write('\n') 1 # open the file in a text editor, you will not see any data in it. >>> fh.flush() # re-open the file in a text editor, you will see the contents as below: # Contents of fileToBeWrittenInto.txt Output line # 1 # This data can now be read by any other process attempting to read it. >>> fh.write('Output line # 2') # open the file in a text editor, you will see the contents as below: # Contents of fileToBeWrittenInto.txt Output line # 1 >>> fh.flush() # open the file in a text editor, you will see the contents as below: # Contents of fileToBeWrittenInto.txt Output line # 1 Output line # 2 >>> fh.close()
Do not be misled that write() doesn’t actually ‘write’ data to a file, it does, but only when the close() is called. In other words, the close() method flushes the data to the file before closing it. If you wish to write to a file without having to close it, you can use the flush() method.
# Using the fsync(fileHandler) function. >>> fh = open('fileToBeWrittenInto2.txt', 'w+') >>> fh.write('Output line # 1') 15 >>> fh.write('\n') 1 # open file in text-editor, it will be empty. >>> fh.flush() # open file in text-editor, it will have the following contents # Contents of fileToBeWrittenInto2.txt Output line # 1 >>> fh.write('Output line # 2') 15 # check file contents, they will be unchanged as flush() hasn't been called yet # Contents of fileToBeWrittenInto2.txt Output line # 1 # Now let's use the fsync() function >>> import os >>> help(os.fsync) Help on built-in function fsync in module nt: fsync(. ) fsync(fildes) force write of file with filedescriptor to disk. >>> os.fsync(fh) # check file contents, they will be unchanged. As we know, fsync() copies data from operating system buffers to file( i.e. in the disk). In this case, there is no pending data in the operating system buffers because flush has not been called. Once flush() is called, the data will be in operating system buffers which may or may not copy data to the file. If it is not copied, then fsync() will force the write to the file when it is called. # Contents of fileToBeWrittenInto2.txt Output line # 1 >>> fh.flush() # check file contents # Contents of fileToBeWrittenInto2.txt Output line # 1 Output line # 2 >>> fh.close() # In this interactive example, we can see that as soon as flush() is called, the data is being written to the file itself, so we don't really feel the need of fsync() right now. But in a script containing hundreds of lines, it is not viable to check the contents of file after each statement, so it is safe to call the fsync(fileHandler) function, to err on the side of caution.
Default Buffer Size
You can check the default buffer size of your platform by importing the io module and checking its DEFAULT_BUFFER_SIZE attribute. The returned size is in bytes.
>>> import io >>> io.DEFAULT_BUFFER_SIZE 8192
How To Open A File In Python
This guide covers Opening A File In Python 3.9 and explains the different ways in which you can open a file in Python 3.9. This is likely to come useful a couple of times in the future, feel free to use/copy paste the code for your own purposes as you see fit.
The most basic example shown above, opening a file in read-only text (rt) mode.
Method Signature Open In Python
You open a file in Python with the open function. There is nothing to import in Python 3.9. The method has the following signature:
def open(file, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True):
How To Specify File Path When Opening A File In Python?
It has 6 parameters, the first one is the path of the file that you want to open. This can be existing file if you are planning to read from a file or it can be new file that you plan to create by writing.
How To Specify The Mode When Opening A File In Python?
The second parameter refers to the different modes in which you can open the file. For example you can open a file in write mode with the following setting:
Find below a table to check the different modes available to open files in python that you can also combine together. For example the default mode is ‘rt’.
'r' open for reading (default) 'w' open for writing, truncating the file first 'x' create a new file and open it for writing 'a' open for writing, appending to the end of the file if it exists 'b' binary mode 't' text mode (default) '+' open a disk file for updating (reading and writing) 'U' universal newline mode (deprecated)
Buffering When Opening A File In Python
The third parameter is buffering. By default buffering behaviour will depend on whether you are using binary or text mode to access the file. Files are read in fixed size chunks (buffering > 1) that you can specify and interactive text files are read by line (buffering=1). You can disable buffering by setting it to 0.
For example if you try to read a text file with buffering=0 you will get a ValueError: can’t have unbuffered text I/O error.
How To Specify Encoding When Opening A File In Python?
The fourth parameter encoding. This should only be used in text mode. You can find them at https://docs.python.org/3.9/library/codecs.html#standard-encodings
An example using encoding is as follows:
file = open("filename.txt","w", encoding="UTF8")
The fifth parameter is errors, related to the encoding. Errors is an optional string that specifies how encoding errors are to be handled—this argument should not be used in binary mode. Pass ‘strict’ to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass ‘ignore’ to ignore errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register or run ‘help(codecs.Codec)’
for a list of the permitted encoding error strings.
Newline Considerations When Opening A File In Python
The sixth parameter is newline and this one also only works in text mode. The default mode is called Universal mode. Upon input, Lines ending with ‘\n’, ‘\r’, or ‘\r\n’ are translated into ‘\n’ before being returned to the caller.
What Does closefd Do When Opening A File In Python?
The final parameter clode. The important point is firstly that you can use a string or bytearray as a file for both reading and writing. Closefd has to be True for files and can be False when using the above mentioned types. If you try to set closefd as false for a file you will get the following error:
ValueError: Cannot use closefd=False with file name
That was it, let us know any specific articles on Python that you would like to see! If you want to check how to log in JSON with Python check this article: Python logging json formatter