How to read big file in php without being memory limit
I’m trying to read a file line by line. The problem is the file was too big(over 500000 line) and I reach out the memory limit. I wonder how to read the file without being memory limit. I’m thinking about the solution multi threads(like split the file into smaller group(100000 line per group) and read it in multi threads), but I don’t know how to do it in detail. Please help me(Sorry for bad English). Here is my code
$fn = fopen("myfile.txt", "r"); while(!feof($fn)) < $result = fgets($fn); echo $result; >fclose($fn);
Since you read line by line and don’t store the lines anywhere, the code as you have posted it should use only as much memory as the longest line in your file. Where do you hit the memory limit?
«When reading a line is finished, I store the data of that line into DB» You should show us that code, then.
4 Answers 4
You could use a generator to handle the memory usage. This is just an example written by a user on the documentation page:
function getLines($file) < $f = fopen($file, 'r'); try < while ($line = fgets($f)) < yield $line; >> finally < fclose($f); >> foreach (getLines("file.txt") as $n => $line) < // insert the line into db or do whatever you want with it. >
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.
Awesome answer. Didn’t realize that yield is a thing now in PHP. Since 5.5 even. curious how I’ve never seen it used until now. I really should read update notes.
@Tschallacka no worries, I also ain’t seen it used until a few weeks ago. Once I saw it and read about it I was simply amazed. It could be used very often to overcome too much memory usage.
right now all the cases where I wrote difficult code to get around the memory limit and this thing was there all the time. I’ve been wishing php had a a yield method, but I never checked if it had. I could kick myself for not checking it out. I wish I could upvote this twice.
I known about it for a while, but I don’t see how it helps in this case as the only thing in memory is a single line, which is the case even with this generator.
Is it not necessary to trim the lines? Or how does it recognize when a line starts and a line ends? Do «fgets» trims the lines for you?
PHP cleans memory best when a scope is cleared in my experience. A loop doesn’t count as a scope, but a function does.
So handing your file pointer to a function, doing your database things within the function and then exiting the function for the loop, where you can call gc_collect_cycles() should help with managing your memory and to force php to clean up after itself.
I also recommend turning off echo, but rather log to a file. You can then use a command tail -f filename to read that log output(windows linux subsystem, git for windows bash, or on linux)
I use a similar method to below to handle large files with millions of entries, and it helps with staying under the memory limit.
function dostuff($fn) < $result = fgets($fn); // store database, do transforms, whatever echo $result; >$fn = fopen("myfile.txt", "r"); while(!feof($fn)) < dostuff($fn); flush(); // only need this if you do the echo thing. gc_collect_cycles(); >fclose($fn);
read big file in php (more than 500mb)
This error occurs after several minutes so that the constant max_input_time , I think not is the problem.(is set to 60).
I have looked the use of the ram memory in task manager. This fully. why? if I’m reading the file line by line .
3 Answers 3
What browser software do you use? Apache, nginx? You should set the max accepted file upload at somewhere higher than 500MB. Furthermore, the max upload size in the php.ini should be bigger than 500MB, too, and I think that PHP must be allowed to spawn processes larger than 500MB. (check this in your php config).
I use apache. but i dont understand. I do not want to upload the file (The file is already on the server) , I just want to read line by line to find some matches .
Oh sorry, than I misunderstood you. What operating system are you using? What is the RAM size of your server, and the max allowed RAM usage of one single apache process?
I’m used windows 7 and xampp. 4gb Ram. Where I can see this? max allowed RAM usage of one single apache process?
Set the memory limit ini_set(«memory_limit»,»600M»); also you need to set the time out limit
Generally long running processes should not be done while the users waits for them to complete. I’d recommend using a background job oriented tool that can handle this type of work and can be queried about the status of the job (running/finished/error).
My first guess is that something in the middle breaks the connection because of a timeout. Whether it’s a timeout in the web server (which PHP cannot know about) or some firewall, it doesn’t really matter, PHP gets a signal to close the connection and the script stops running. You could circumvent this behaviour by using ignore-user-abort(true), this along with set_time_limit(0) should do the trick.
The caveat is that whatever caused the connection abort will still do it, though the script would still finish it’s job. One very annoying side effect is that this script could possibly be executed multiple times in parallel without neither of them ever completing.
Again, I recommend using some background task to do it and an interface for the end-user (browser) to verify the status of that task. You could also implement a basic one yourself via cron jobs and database/text files that hold the status.
Reading very large files in PHP
fopen is failing when I try to read in a very moderately sized file in PHP . A 6 meg file makes it choke, though smaller files around 100k are just fine. i’ve read that it is sometimes necessary to recompile PHP with the -D_FILE_OFFSET_BITS=64 flag in order to read files over 20 gigs or something ridiculous, but shouldn’t I have no problems with a 6 meg file? Eventually we’ll want to read in files that are around 100 megs, and it would be nice be able to open them and then read through them line by line with fgets as I’m able to do with smaller files. What are your tricks/solutions for reading and doing operations on very large files in PHP ? Update: Here’s an example of a simple codeblock that fails on my 6 meg file — PHP doesn’t seem to throw an error, it just returns false. Maybe I’m doing something extremely dumb?
$rawfile = "mediumfile.csv"; if($file = fopen($rawfile, "r")) < fclose($file); >else
Another update: Thanks all for your help, it did turn out to be something incredibly dumb — a permissions issue. My small file inexplicably had read permissions when the larger file didn’t. Doh!
Are you just trying to pass the file thru? ie. Download? Or are you actually parsing the data in the files for some purpose? Thx.
it should not fail without generating a warning / error. Please turn all errors on with error_reporting(E_ALL) and make sure display_errors are set to on to show in your browser, or check your webservers error log.
8 Answers 8
Are you sure that it’s fopen that’s failing and not your script’s timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.
Another thing to consider may be the memory limit on your script — reading the file into an array may trip over this, so check your error log for memory warnings.
If neither of the above are your problem, you might look into using fgets to read the file in line-by-line, processing as you go.
$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle"); if ($handle) < while (!feof($handle)) < $buffer = fgets($handle, 4096); // Process buffer here.. >fclose($handle); >
PHP doesn’t seem to throw an error, it just returns false.
Is the path to $rawfile correct relative to where the script is running? Perhaps try setting an absolute path here for the filename.
It is only possible solution how to open really big files. I am processing by this solution 1.5GB file without any problem. All other solutions like file_get_contents of file will read whole file to memory. This approach is processing line by line.
@Phoenix 4096 means, read at most 4096 — 1 bytes iff no line breaks are encountered. Check the manual.
For me stream_get_line is more faster than fgets check out this comparative gist.github.com/joseluisq/6ee3876dc64561ffa14b
Did 2 tests with a 1.3GB file and a 9.5GB File.
Using fopen()
This process used 15555 ms for its computations.
It spent 169 ms in system calls.
This process used 6983 ms for its computations.
It spent 4469 ms in system calls.
Using fopen()
This process used 113559 ms for its computations.
It spent 2532 ms in system calls.
This process used 8221 ms for its computations.
It spent 7998 ms in system calls.
• The fgets() function is fine until the text files passed 20 MBytes and the parsing speed is greatly reduced.
• The file_ get_contents() function give good results until 40 MBytes and acceptable results until 100 MBytes, but file_get_contents() loads the entire file into memory, so it’s not scalabile.
• The file() function is disastrous with large files of text because this function creates an array containing each line of text, thus this array is stored in memory and the memory used is even larger.
Actually, a 200 MB file I could only manage to parse with memory_limit set at 2 GB which was inappropriate for the 1+ GB files I intended to parse.
When you have to parse files larger than 1 GB and the parsing time exceeded 15 seconds and you want to avoid to load the entire file into memory, you have to find another way.
My solution was to parse data in arbitrary small chunks. The code is:
$filesize = get_file_size($file); $fp = @fopen($file, "r"); $chunk_size = (1 <<24); // 16MB arbitrary $position = 0; // if handle $fp to file was created, go ahead if ($fp) < while(!feof($fp))< // move pointer to $position in file fseek($fp, $position); // take a slice of $chunk_size bytes $chunk = fread($fp,$chunk_size); // searching the end of last full text line (or get remaining chunk) if ( !($last_lf_pos = strrpos($chunk, "\n")) ) $last_lf_pos = mb_strlen($chunk); // $buffer will contain full lines of text // starting from $position to $last_lf_pos $buffer = mb_substr($chunk,0,$last_lf_pos); //////////////////////////////////////////////////// //// . DO SOMETHING WITH THIS BUFFER HERE . //// //////////////////////////////////////////////////// // Move $position $position += $last_lf_pos; // if remaining is less than $chunk_size, make $chunk_size equal remaining if(($position+$chunk_size) >$filesize) $chunk_size = $filesize-$position; $buffer = NULL; > fclose($fp); >
The memory used is only the $chunk_size and the speed is slightly less than the one obtained with file_ get_contents() . I think PHP Group should use my approach in order to optimize it’s parsing functions.
*) Find the get_file_size() function here.