Python Socket Receive Large Amount of Data
When I try to receive larger amounts of data it gets cut off and I have to press enter to get the rest of the data. At first I was able to increase it a little bit but it still won’t receive all of it. As you can see I have increased the buffer on the conn.recv() but it still doesn’t get all of the data. It cuts it off at a certain point. I have to press enter on my raw_input in order to receive the rest of the data. Is there anyway I can get all of the data at once? Here’s the code.
port = 7777 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind(('0.0.0.0', port)) sock.listen(1) print ("Listening on port: "+str(port)) while 1: conn, sock_addr = sock.accept() print "accepted connection from", sock_addr while 1: command = raw_input('shell> ') conn.send(command) data = conn.recv(8000) if not data: break print data, conn.close()
13 Answers 13
TCP/IP is a stream-based protocol, not a message-based protocol. There’s no guarantee that every send() call by one peer results in a single recv() call by the other peer receiving the exact data sent—it might receive the data piece-meal, split across multiple recv() calls, due to packet fragmentation.
You need to define your own message-based protocol on top of TCP in order to differentiate message boundaries. Then, to read a message, you continue to call recv() until you’ve read an entire message or an error occurs.
One simple way of sending a message is to prefix each message with its length. Then to read a message, you first read the length, then you read that many bytes. Here’s how you might do that:
def send_msg(sock, msg): # Prefix each message with a 4-byte length (network byte order) msg = struct.pack('>I', len(msg)) + msg sock.sendall(msg) def recv_msg(sock): # Read message length and unpack it into an integer raw_msglen = recvall(sock, 4) if not raw_msglen: return None msglen = struct.unpack('>I', raw_msglen)[0] # Read the message data return recvall(sock, msglen) def recvall(sock, n): # Helper function to recv n bytes or return None if EOF is hit data = bytearray() while len(data) < n: packet = sock.recv(n - len(data)) if not packet: return None data.extend(packet) return data
Then you can use the send_msg and recv_msg functions to send and receive whole messages, and they won't have any problems with packets being split or coalesced on the network level.
I am not sure if I am understanding this completely. I understand the what's supposed to be happening but I cant't seem to be getting it. I am getting Exception: Socket EOF trying to recv 4 bytes I am using the following: pastebin.com/raw.php?i=AvdN5RyW
@user2585107: Try the updated version, which uses a return None instead of raising an exception when the stream ends.
shouldn't the packet be .decode() ed before adding it to data or recv() can receive both bytes and strings?
The line data += packet can make receiving VERY slow for large messages. It's much better to use data = bytearray() and then data.extend(packet) .
You can use it as: data = recvall(sock)
def recvall(sock): BUFF_SIZE = 4096 # 4 KiB data = b'' while True: part = sock.recv(BUFF_SIZE) data += part if len(part) < BUFF_SIZE: # either 0 or end of data break return data
This works for detection of "End of File", but not for keeping a connection and detecting the end of a message. "End of File" will only be reached if the peeer closes its part of the socket, or at least half-closes it.
If the string received is less than 4096 chars, it will loop again and re-check for more data using sock.recv() . This will hang since there isn't any more data is coming in. If the length of part is less than that of the RECV_BUFFER , then the code can safely break out of the loop.
@JadedTuna, doesn't seem to be fixed. The line "part = sock.recv(BUFF_SIZE)" seems to be a blocking call, thus execution hangs at this line once the full message has been received.
This seems to wrongly assume that one send on one end of a TCP socket corresponds to one receive of sent number of bytes on the other end (see e.g. here or here). Thus, even when a client sends exactly 4kb with one send , server might get the first, say, 1kb at the first recv , which would lead the while to break.
The accepted answer is fine but it will be really slow with big files -string is an immutable class this means more objects are created every time you use the + sign, using list as a stack structure will be more efficient.
while True: chunk = s.recv(10000) if not chunk: break fragments.append(chunk) print "".join(fragments)
Most of the answers describe some sort of recvall() method. If your bottleneck when receiving data is creating the byte array in a for loop, I benchmarked three approaches of allocating the received data in the recvall() method:
fragments = [] while True: chunk = sock.recv(max_msg_size) if not chunk: break fragments.append(chunk) arr = b''.join(fragments)
Pre-allocated bytearray method:
arr = bytearray(msg_len) pos = 0 while pos < msg_len: arr[pos:pos+max_msg_size] = sock.recv(max_msg_size) pos += max_msg_size
You may need to call conn.recv() multiple times to receive all the data. Calling it a single time is not guaranteed to bring in all the data that was sent, due to the fact that TCP streams don't maintain frame boundaries (i.e. they only work as a stream of raw bytes, not a structured stream of messages).
See this answer for another description of the issue.
Note that this means you need some way of knowing when you have received all of the data. If the sender will always send exactly 8000 bytes, you could count the number of bytes you have received so far and subtract that from 8000 to know how many are left to receive; if the data is variable-sized, there are various other methods that can be used, such as having the sender send a number-of-bytes header before sending the message, or if it's ASCII text that is being sent you could look for a newline or NUL character.
Disclaimer: There are very rare cases in which you really need to do this. If possible use an existing application layer protocol or define your own eg. precede each message with a fixed length integer indicating the length of data that follows or terminate each message with a '\n' character. (Adam Rosenfield's answer does a really good job at explaining that)
With that said, there is a way to read all of the data available on a socket. However, it is a bad idea to rely on this kind of communication as it introduces the risk of loosing data. Use this solution with extreme caution and only after reading the explanation below.
def recvall(sock): BUFF_SIZE = 4096 data = bytearray() while True: packet = sock.recv(BUFF_SIZE) if not packet: # Important!! break data.extend(packet) return data
Now the if not packet: line is absolutely critical! Many answers here suggested using a condition like if len(packet) < BUFF_SIZE: which is broken and will most likely cause you to close your connection prematurely and loose data. It wrongly assumes that one send on one end of a TCP socket corresponds to one receive of sent number of bytes on the other end. It does not. There is a very good chance that sock.recv(BUFF_SIZE) will return a chunk smaller than BUFF_SIZE even if there's still data waiting to be received. There is a good explanation of the issue here and here.
By using the above solution you are still risking data loss if the other end of the connection is writing data slower than you are reading. You may just simply consume all data on your end and exit when more is on the way. There are ways around it that require the use of concurrent programming, but that's another topic of its own.
Socket. How to receive all data with socket.recv()?
I have a problem with receiving data from server to client. I have the following client-side function that attempts to receive data from the server. The data sent by the server using the socket.sendall (data) function is greater than buff_size so I need a loop to read all the data.
def receiveAll (sock): data = "" buff_size = 4096 while True: part = sock.recv (buff_size) data + = part if part
The problem that occurs to me is that after the first iteration (read the first 4096mb), in the second the program is blocked waiting for the other data in part = sock.recv (buff_size) . How do I have to do so that recv() can continue reading the other missing data? Thank you.
1 Answer 1
Your interpretation is wrong. Your code reads all the data that it get from the server. It just doesn't know that it should stop listening for incoming data. It doesn't know that the server sent everything it had.
First of all note that these lines
Also if your server sends a content of the size being a multiple of buff_size then the if part will never be satisfied and it will hang on .recv() forever.
Side note: don't use semicolons ; . It's Python.
There are several solutions to your problem but none of them can be used correctly without modyfing the server side.
As a client you have to know when to stop reading. But the only way to know it is if the server does something special and you will understand it. This is called a communication protocol. You have to add a meaning to data you send/receive.
For example if you use HTTP, then a server sends this header Content-Length: 12345 before body so now as a client you know that you only need to read 12345 bytes (your buffer doesn't have to be as big, but with that info you will know how many times you have to loop before reading it all).
Some binary protocols may send the size of the content in first 2 or 4 bytes for example. This can be easily interpreted on the client side as well.
Easier solution is this: simply make server close the connection after he sends all the data. Then you will only need to add check if not part: break in your code.
Retrieve all the response in socket at one time
I am writing a client-sever program based on Python socket. The client sends a command to the server and the server responds. But now, some client can broadcast a message to other clients, so the client can receive more than one response at the same time.
the line of code above will retrieve only one response from the server. but if I use a while loop like this
while True: data = s.recv(1024) if not data: break
actually, data=s.recv(1024) will block the program when there is no data left. I don't want to block the program and want to retrieve all the responses available in the connection at one time. Can anyone find a solution? Thank you.
Unless you're using UDP you need 1 socket for each client. s.recv will always block you need a separate thread.
3 Answers 3
You can use the select module to wait until the socket is readable or until a timeout has elapsed; you can then perform other processing. For example:
while True: # If data can be received without blocking (timeout=0), read it now ready = select.select([s], [], [], 0) if s in ready[0]: data = s.recv(1024) # Process data else: # No data is available, perform other tasks
You could make the socket (s) non-blocking. This way, it will retrieve all the received responses and when there is none, it will return back. Of course, with non-blocking, you will have to periodically retry.
You could make the socket (s) non-blocking using the setblocking() method:
The other option is to use another thread to handle the receive part. This way, your main thread can continue doing its main task and act upon the message only if it receives one.
You can use socket.setblocking or socket.settimeout :
import socket import sys HOST = 'www.google.com' PORT = 80 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) s.setblocking(0) s.sendall('Hello, world') try: data = s.recv(1024) except: print 'Oh noes! %s' % sys.exc_info()[0] s.close()
socket.recv takes two parameters, the second is a set of flags. If you're on a Linux system, you can do man recv for a list of flags you can supply, and their corresponding errors.
Lastly, in general, you can't really know that the other side is done with sending you data (unless you're controlling both sides), even if you're both following a protocol. I believe the right way to go about it is to use timeouts, and quit after sending a reset (how you do this will depend upon what protocol you're using).