I have UploadedFileInterface files from the request and I need to validate if the client sent us the correct media type.
Normally one would take the $_FILES['file']['tmp_name'] and use fileinfo, however, with PSR-7 I have streams so I have no way to tell what the media type is. To do so, I need to store the file in a temporary/intermediate location using moveTo to use fileinfo. But then I can not use the instance of the UploadedFileInterface implementation object anymore, so i need to solve that…
Furthermore, PHP cleans unprocessed uploaded files automatically on script end, even when the script fails. This is yet another significant issue to deal with once using the moveTo call, and the proper way is to register a shutdown handler to check if the file had been unlinked properly … which seems hacky.
This makes it a mess out of a previously simple raw fileinfo check. Am I missing something?
This simplest approach would be to use the filename extension from the uploaded file and then use the extension from that filename…
$filename = $uploadedFile->getClientFilename();
… but, for security reasons, this value should not be trusted, because the client could also send malicious filenames.
Instead of moving the file to a temporary location, you can read the stream directly into memory and then use finfo_buffer to determine the media type.
The issue with this approach is that very large files can’t be read into memory. So depending on the filesize and system memory limits, you may still need to store it in a temporary location.
So If you must move the file you may try this:
$tempDirectory = sys_get_temp_dir();
$basename = sprintf('%s.tmp', bin2hex(random_bytes(8));
$tempFile = sprintf('%s/%s', $tempDirectory, $basename);
$uploadedFile->moveTo($targetPath);
$finfo = new finfo(FILEINFO_MIME_TYPE);
$mimeType = $finfo->file($tempFile);
// then copy the $tempFile into your real file-storage location
// ...
Furthermore, PHP cleans unprocessed uploaded files automatically on script end, even when the script fails.
I think this is actually a good thing. Why should this be an issue? If you move a file after the processing, everything should be fine. You don’t need such an shutdown for that.
Well, I posted the question because I had explored the options you summed up.
no, one does not want to load the contents of file upoads into a memory buffer
and yes, one can use sys_get_temp_dir and copy to the system temp but still needs to handle the deletion of the file afterwards
Why would automatic deletion be a good thing?
Well, first, I need this just to validate user input (the file media type). That means I validate 54 files, storing them on the disk, then on the 55th one the file mime type is wrong. I simply want to throw an exception or just return a validation issue back to the client. Why would I want to deal with all those now unnecessary files that got uploaded on the server’s disk?
One can easily tamper the content-type headers to send executables disguised as images or documents. The same goes for file names which can not be trusted at all, even a simpleton can forge a false file extension. That is why one needs to validate the media type and I see no straight-forward way to do it.
Obviously, when everything is safe and sound, then one does copy the uploaded files from the temp to wherever it is fit.
But for validation purposes, the PSR-7 gets in way, or I’m missing something.
I think PSR-7 is still not the issue. The solution can be adapted to the PSR-7 “stream”. It just needs a custom approach to deal with streams for validation etc.
I cannot provide a fully working code, but only some ideas how I would try it.
My idea would be to only read the first bytes (e.g. 512 bytes or less) of the stream to find the “magic bytes” for the mime type detection. With this trick you could then detect the mime type, because most binary files put the identifier to the first file offsets. This would allow you to detect the mime type without storing on in the hard disk. After the the validation, you can move the file into the filestorage.
$stream = $uploadedFile->getStream();
$stream->rewind(); // Rewind to the beginning of the stream
$contents = $stream->read(512); // Read the first 512 bytes
Then validate the mime type with the first header bytes:
$finfo = new finfo(FILEINFO_MIME_TYPE);
$mimeType = $finfo->buffer($contents);
// validate ...
If valid, move the file:
$uploadedFile->moveTo($targetPath)
Note that this approach should work with most file types (images, etc), except files that are based on the ZIP format, such all MS Office / LibreOffice formats like DOCX, XLSX etc.
I still need to be able to deal with the exceptions, as ZIP and office files comprise half the payload being uploaded, the other half being media files of various kind and size.
I need a robust solution. Seems to me the extra I/O operation is needed here, but feels redundant at the least.