Named pipe linux python
A pipe is a simple FIFO communication channel that is unidirectionally communicating. Usually the startup process creates a pipeline, and then the process creates one or more child processes to accept the pipeline information. Since the pipeline is one-way communication, it is often necessary to create two pipelines to implement two-way communication.
Named pipes are extensions to traditional pipes. The default pipe is an anonymous pipe that exists only when the program is running. The named pipe is persistent and needs to be deleted when it is not needed.
Named pipes use the file system and are created by the mkfifo() method. Once created, two separate processes can access it, one read and the other written.
Named pipes support blocking read and block write operations: If a process opens a file read, it blocks until another process writes. But we can specify the O_NONBLOCK option to enable non-blocking mode.
The named pipe must be opened in read-only or write-only mode. It cannot be opened in read + write mode because it communicates in one direction. If you want to achieve two-way communication, you must open two named pipes.
Here’s an example of how Python uses named pipes to implement interprocess communication.
Server side
import os, time read_path = "/tmp/pipe.in" write_path = "/tmp/pipe.out" if os.path.exists(read_path): os.remove(read_path) if os.path.exists(write_path): os.remove(write_path) os.mkfifo(write_path) os.mkfifo(read_path) rf = os.open(read_path, os.O_RDONLY) wf = os.open(write_path, os.O_SYNC | os.O_CREAT | os.O_RDWR) while True: s = os.read(rf, 1024) print "received msg: %s" % s if len(s) == 0: time.sleep(1) continue if "exit" in s: break os.write(wf, s) os.close(rf) os.close(wf)
Client side
import os import time write_path = "/tmp/pipe.in" read_path = "/tmp/pipe.out" wf = os.open(write_path, os.O_SYNC | os.O_CREAT | os.O_RDWR) rf = None for i in range(1, 11): msg = "msg " + str(i) len_send = os.write(wf, msg) print "sent msg: %s" % msg if rf is None: rf = os.open(read_path, os.O_RDONLY) s = os.read(rf, 1024) if len(s) == 0: break print "received msg: %s" % s time.sleep(1) os.write(wf, 'exit') os.close(rf) os.close(wf)
test
Named Pipes to Turn CLI Programs Into Python Functions
A named pipe is like a file that doesn’t store anything. It has a path (the name I guess). It can be opened, read from, and written to, but the content is temporary and stored in memory.
Since a named pipe is also a pipe it acts a bit different than a normal file. When you open it you can only open it as read-only or write-only not read-write. The idea is you’d have one process with it open for write (the producer) and one process with it open for read (the consumer). This matches a pipe since they have a read-end and a write-end. At least in Unix.
A call to open for read or write will block until something else opens it for the other. Calls to read or write will also block if the named pipe is empty or full respectively.
Named pipes can be useful if you need to create CLI pipelines with programs that consume or produce multiple inputs and outputs. You have to be careful not to create deadlocks due to the blocking behaviour though. You also can’t use this with a program that needs to seek in a file or that reads and writes the same file.
I’m using them to run file based CLI programs from Python without writing to the disk.
What I Was Trying To Do
I’m using two command line programs rgbasm and rgblink . They’re part of the RGBDS GameBoy assembler toolchain. If I were running them from the command line I would use the following two commands:
rgbasm A.asm -o A.o rgblink A.o B.o -o A.gb -m A.map -n A.sym
I want my Python script to be able to provide A.asm and get back the contents of A.gb , A.map , and A.sym .
So I can just have my script write A.asm , use subprocess.run() to run the two commands then read in the three files!
Unnecessary Constraints!
I didn’t want to write anything to disk.
Why?
My justification was that I want to run this in a server and having it writing to and reading from the disk felt wrong. That’s a bad reason though. This is for a personal toy project. Making scalable, production quality software is not the goal! I’m not saying I achieved scalable, production quality software though, just that it’s not the goal.
The actual reason was that I thought it would be possible to do and I wanted to figure out how. That’s a good reason! At least for a personal toy project. Even if it is a bit of a detour.
What Did I Know Going In
- Linux stuff
- pipes exist
- you can use them to route stdin/out/err from one program to another programs
- files exist
- everything is a file: directories, devices, programs, you, the internet, the computer, your home, the love you feel for others, this list entry
- normal file stuff
- how to use the subprocess module to run programs
- I can yell PIPE to communicate with subprocesses over stdin/out/err
- some thread stuff
Named Pipes / FIFOs
I quickly came across named pipes and how to create them in Python with os.mkfifo or from the command line with mkfifo (I don’t know about windows though). After being briefly distracted by going overboard with context managers to create and delete a temp directory and a bunch of FIFOs I ran into my first gotcha. My program froze when I tried to open a FIFO.
The problem was that I was only opening one end of the FIFO/pipe and that will block until the other end is opened.
This was lucky! Confusing, but lucky! You see I didn’t know what I was doing and I very well could have accidentally written my code in a way that missed this problem. Then I never would have learned that opening a FIFO/named pipe will block unless the other end of also open. I also wouldn’t have realized that I knew less what I was doing than usual and I’d need to pay more attention.
You can use os.open() to open without blocking and that’s what I did at first. Something like this (rough memory probably not exactly what I did):
def opener(path, flags): return os.open(path, flags | os.O_NONBLOCK) with open('fifo_in', 'w', opener=opener) as wf, \ open('fifo_out', 'rb', opener=opener) as rf: wf.write(input_data) p = subprocess.Popen(['rgbasm', 'fifo_in', '-o', 'fifo_middle']) subprocess.run(['rgblink', 'fifo_middle', '-o', 'fifo_out']) p.wait() output = rf.read()
I was surprised when it worked. But I’m pretty sure this can deadlock in a few ways though. Here’s what’s happening:
- There are three named pipes:
- fifo_in
- fifo_middle
- fifo_out
- rgbasm itself will be blocked when it tries to open fifo_middle because (presumably) it’s trying to open one end of a named pipe with a normal open without the O_NONBLOCK flag. So rgbasm won’t finish until something opens fifo_middle for reading.
- rgblink will open fifo_middle for reading and fifo_out for writing. Neither of these blocks because fifo_middle is already opened for writing by rgbasm and fifo_out is already open for reading by the python code.
- This unblocks rgbasm and it can start producing output and everything move along.
So What’s the Problem?
For one thing, it only works because the input isn’t very big! Pipes/named-pipes/FIFOs have a maximum capacity. If you try to write to one when it’s full then it will block or fail if O_NONBLOCK is set. Reading from one that’s empty will have similar results. There’s more info in the man(7) page in the I/O on pipes and FIFOs, Pipe capacity, and PIPE_BUF sections. So if the input data were bigger then the python script would crash on wf.write(input_data) .
Similarly, if the output were too big then rgblink would block when it tries to write to a full pipe. This means the python script would be blocked from reading anything from the pipe so everything will just be stuck!
Maybe Don’t Use O_NONBLOCK ?
Ok, I could change it to something like:
p_rgbasm = subprocess.Popen(['rgbasm', 'fifo_in', '-o', 'fifo_middle']) p_rgblink = subprocess.Popen(['rgblink', 'fifo_middle', '-o', 'fifo_out']) with open('fifo_in', 'w') as wf, \ open('fifo_out', 'rb') as rf: wf.write(input_data) p_rgblink.wait() output = rf.read()
Now we’re not using O_NONBLOCK so at least reads and writes won’t fail in the Python code. I even think this might work for this case! Here’s what I think is happening:
- rgbasm and rgblink are started and running in the background. They’re both definitely blocked though since they need to open fifo_in and fifo_out respectively, and their other ends aren’t open yet.
- I open the other ends of fifo_in and fifo_out so rgbasm rgblink should be unblocked.
- They’ll both still get blocked when they try to read from empty pipes ( rgbasm from fifo_in and rgblink from fifo_middle ).
I could probably fix this by getting rid of p_rgblink.wait() but I don’t actually know if rf.read() will always return the entire output. I think it will as long as rgblink doesn’t do something strange like close the file and reopen it to write some more. The reason I think this is because read() will read until it hits an end-of-file. I think that only happens when all the write-ends of the pipe are closed.
Anyway we have other problems.
Other Problems
The Python script could have blocked on open(‘fifo_in’, ‘w’) because we have no guarantee about how rgbasm and rgblink open files. For example if:
rgbasm opens them in the order
and rgblink opens its files in the order
So now we’re in a deadlock!
- The Python script is blocked waiting for rgbasm to open fifo_in .
- rgbasm won’t open fifo_in since it’s blocked waiting for rgblink to open fifo_middle .
- rgblink won’t open fifo_middle since it’s blocked waiting for the Python script to open fifo_out .
- The Python script won’t open fifo_out because of point 1!
Grumble
Maybe there’s something in asyncio that could help me?
Ok, no, that’s more complicated than I thought it would be and I don’t see a magic wand for this anyway.
Fine I’ll Use Threads
I already have deadlocks. I can’t make it much worse right?
import threading import subprocess import io def doRGBStuff(input_data): def do_write(file_name, data): with open(file_name', 'w') as wf: wf.write(data) def do_read(file_name, out_stream): with open(file_name, 'rb') as rf: out_stream.write(rf.read()) write_thread = threading.Thread(target=do_write, args=('fifo_in', input_data)) write_thread.start() p_rgbasm = subprocess.Popen(['rgbasm', 'fifo_in', '-o', 'fifo_middle']) p_rgblink = subprocess.Popen(['rgblink', 'fifo_middle', '-o', 'fifo_out']) out_stream = io.BytesIO() read_thread = threading.Thread(target=do_read, args=('fifo_out', out_stream)) read_thread.start() read_thread.join() return out_stream.getvalue()
Now we have write_thread to write data into fifo_in and read_thread to read from fifo_out and put it into an io.BytesIO stream. Everything that interacts with the fifos is either a thread or subprocess so when they block it won’t block the main Python script. The script blocks at the end with read_thread.join() . This should wait until rgblink finishes writing its output. It still won’t work properly if rgblink closes and reopens the output file in the middle.
I can’t think of anything that would let this deadlock, but maybe I’m missing something.
To complete the original task of getting three outputs I’ll just need two more fifos and reading threads.
Also I haven’t tested any of the specific code in this post! It’s based on some other code that DOES work though!