Python socket buffer size

Python – How to Receive Full Data with the recv() Socket function

If you are writing a socket program in python that communicates with some remote machine or server to receive data, then you shall be using the recv() function to receive data on a socket.

The recv() can receive only a specified number of bytes into a string buffer. If the incoming data is larger than the buffer size then only the part of the data is received, and rest gets lost.

This is a problem with the recv function in socket.

Lets take a quick example to understand this:

Code

#Socket client example in python import socket #for sockets import sys #for exit #create an INET, STREAMing socket try: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) except socket.error: print 'Failed to create socket' sys.exit() print 'Socket Created' host = 'www.google.com'; port = 80; try: remote_ip = socket.gethostbyname( host ) except socket.gaierror: #could not resolve print 'Hostname could not be resolved. Exiting' sys.exit() #Connect to remote server s.connect((remote_ip , port)) print 'Socket Connected to ' + host + ' on ip ' + remote_ip #Send some data to remote server message = "GET / HTTP/1.1\r\n\r\n" try : #Set the whole string s.sendall(message) except socket.error: #Send failed print 'Send failed' sys.exit() print 'Message send successfully' #Now receive data reply = s.recv(4096) print reply

The output of the above code might be something like this :

$ python simple_client.py Socket Created Socket Connected to www.google.com on ip 209.85.175.99 Message send successfully HTTP/1.1 302 Found Location: http://www.google.co.in/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com Set-Cookie: path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; $

The problem ? The output is not complete. Some data has been left out. Communication like the above takes places through the TCP/IP protocol.

Читайте также:  Зеленая галочка html код

In this protocol the data transfer takes place in chunks. Lets say a webpage is 500KB in size, but the maximum packet size is only 64KB.

Hence the transfer of the web page will take place in parts or chunks and not the whole thing at once.

Now this is where the problem comes in. The recv function can be made to wait till it receives full data, but for this it must know beforehand the total size of the full data. s.recv(4096 , socket.MSG_WAITALL) will wait till it gets full 4096 bytes.

Now if the actual response size is less than that size, the function will block for a long time before it returns. This is definitely not the desired behaviour we are looking for.

recv() in chunks — The Solution

The solution is to keep looking for data till a decent timeout occurs. And in the next code example we shall precisely do the same.

#Socket client example in python import socket #for sockets import sys #for exit import struct import time #create an INET, STREAMing socket try: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) except socket.error: print 'Failed to create socket' sys.exit() print 'Socket Created' host = 'www.google.com'; port = 80; try: remote_ip = socket.gethostbyname( host ) except socket.gaierror: #could not resolve print 'Hostname could not be resolved. Exiting' sys.exit() #Connect to remote server s.connect((remote_ip , port)) print 'Socket Connected to ' + host + ' on ip ' + remote_ip #Send some data to remote server message = "GET / HTTP/1.1\r\n\r\n" try : #Set the whole string s.sendall(message) except socket.error: #Send failed print 'Send failed' sys.exit() print 'Message send successfully' def recv_timeout(the_socket,timeout=2): #make socket non blocking the_socket.setblocking(0) #total data partwise in an array total_data=[]; data=''; #beginning time begin=time.time() while 1: #if you got some data, then break after timeout if total_data and time.time()-begin > timeout: break #if you got no data at all, wait a little longer, twice the timeout elif time.time()-begin > timeout*2: break #recv something try: data = the_socket.recv(8192) if data: total_data.append(data) #change the beginning time for measurement begin=time.time() else: #sleep for sometime to indicate a gap time.sleep(0.1) except: pass #join all parts to make final string return ''.join(total_data) #get reply and print print recv_timeout(s) #Close the socket s.close()

The above code will have an output similar to this

Saw the closing html tag at the end ? Now thats complete data and infact that is what the browser displays when google.com is opened. The whole magic takes place inside the function recv_timeout. So lets have a look at how does it work.

def recv_timeout(the_socket,timeout=2): #make socket non blocking the_socket.setblocking(0) #total data partwise in an array total_data=[]; data=''; #beginning time begin=time.time() while 1: #if you got some data, then break after timeout if total_data and time.time()-begin > timeout: break #if you got no data at all, wait a little longer, twice the timeout elif time.time()-begin > timeout*2: break #recv something try: data = the_socket.recv(8192) if data: total_data.append(data) #change the beginning time for measurement begin = time.time() else: #sleep for sometime to indicate a gap time.sleep(0.1) except: pass #join all parts to make final string return ''.join(total_data)

1. Make the socket non-blocking. By doing this, the socket wont wait if there is no data in recv calls. It will continue if there is no data available.

2. Do in a loop the following : keep calling recv, till a timeout occurs or recv finishes up on its own.

Now this is a very simple approach to demonstrate how the recv function ought to work in real applications. The same function can be developed further and made more complex according to the protocol it is working with, for example HTTP.

If you have any feedback or questions let us know in the comments below.

A Tech Enthusiast, Blogger, Linux Fan and a Software Developer. Writes about Computer hardware, Linux and Open Source software and coding in Python, Php and Javascript. He can be reached at [email protected] .

19 Comments

  1. Anibal May 23, 2022 at 9:09 pm In Python 3.8 this line doesn’t work: return ”.join(total_data) It throws: return ”.join(total_data)
    TypeError: sequence item 0: expected str instance, bytes found My data is a bunch of characters (not binary data), but I have control characters in the stream. And if I replace it with: return ”.join(str(total_data)) I’m basically changing the control character \x0b to a literal string with a backslash, followed by x, followed by 0 and then b. That’s not what I want. Not sure how to address this.
  1. Silver Moon Post author September 28, 2020 at 10:10 pm yes you are correct.
    the code is not fully compatible with python 3.
    it needs to be edited.

Источник

Оцените статью