← Back to blog

Inspecting Filesystem Using SPL

| PHP

This article was published over 2 years ago. Some information may be outdated.

The Standard PHP Library (SPL) is a collection of interfaces and classes that are meant to solve common problems.

SPL was introduced in 2005 with PHP 5.0.0.

SPL does not require any additional libraries; it comes by default when you install PHP.

This post covers the iterators used to deal with the filesystem.

File Handling

SPL provides three classes for file handling:

  • SplFileInfo: file information, such as size, pathname, real path, etc.
  • SplFileObject: an object-oriented interface for a file.
  • SplTempFileObject: an object-oriented interface for a temporary file.

SplFileObject provides an object-oriented interface for a file, so instead of using fopen(), fgets(), eof() functions, you use the SplFileObject:

// Writing to a file
$file = new SplFileObject('myfile', 'w+');
$file->fwrite('Hello World');

Here is another example.

Create a new text file with the following contents:

Hello World
I love PHP
PHP is amazing
SPL is great


PHP is the most used server-side programming language.

Save it as php.txt.

The SplFileObject class extends the SplFileInfo as well as the Iterator interface:

Get the file size:

echo $file->getSize();

Read the file line by line:

$file = new SplFileObject('php.txt');
foreach ($file as $line) {
    echo $line;
}

To get information about a given file, use SplFileInfo:

$info = new SplFileInfo('php.txt');
if ($info->isFile()) {
    echo $info->getRealPath();
}

The SplFileInfo offers an interface that provides many useful methods such as getMime(), getPath(), getSize() etc.

Refer to PHP's documentation for the full list of SplFileInfo methods.

Use the SplTempFileObject to create a memory-based temporary file:

$temp = new SplTempFileObject();
$temp->fwrite('Hello World');

var_dump($temp->getPathName()); // php://temp

The temporary file is stored in memory and not on the file disk, but that is not always the case, as discussed below.

The file is saved into memory because memory is much faster than the file system.

Consider the scenario where you are parsing a CSV file and sending it back to the end user for download:

$file = new SplTempFileObject();
// csv processing here
$file->rewind();
header('Content-Type: text/csv');
header('Content-Disposition: attachmenet; filename=mycsvfile.csv');
$file->fpassthru();

Read more about parsing the csv files in PHP.

If the file size exceeds the max_memory value (which is 2 MB by default), it will be saved as a disk file in the system's temp directory unless you specify the maximum memory size in the __constructor as follows:

$temp = new SplTempFileObject(10); // 10 bytes is the max memory size
$temp->fwrite("This is the first line\n");
$temp->fwrite("And this is the second.\n");

Even if the file is saved on disk, there is no way to get its full system path. It will always refer to php://temp as the file path.

You can use the tmpfile() function to create a disk-based temp file and then retrieve its path using stream_get_meta_data:

$file = tmpfile();
$path = stream_get_meta_data($file)['uri']; // eg: /tmp/phpFx0513a

DirectoryIterator

As its name implies, DirectoryIterator traverses the given directory:

$files = new DirectoryIterator('/Users/ahmad');
echo '<ul>';
foreach ($files as $file) {
    echo '<li>'.$file.'</br>';
}
echo '</ul>';

By default, DirectoryIterator includes the . and .. when listing the files. Use the isDot() method to skip the dots while traversing:

/** @var DirectoryIterator $item */
foreach ($dir as $key => $item) {
    if ($item->isDot()) { continue; }
    echo '<li>'.$item.'</br>';
}

You can view the available methods for the DirectoryIterator by calling get_class_methods() or inspecting PHP's documentation:

print_r(get_class_methods(DirectoryIterator::class));

Sometimes you need to filter the files by storing them into a new array:

$files = [];
foreach ($dir as $key => $item) {
    $files[] = $item;
}
echo $files[0]->getFilename();

Due to the nature of the iterators, the last line will not return anything. To fix it, you need to clone the $item:

$files = [];
foreach ($dir as $key => $item) {
    $files[] = clone $item;
}
echo $files[0]->getFilename();

FilesystemIterator

The FilesystemIterator is an enhanced version of the DirectoryIterator.

FilesystemIterator extends the DirectoryIterator and adds a few more features:

  • flags: configurable options.
  • Returns SplFileInfo as a file object instead of the DirectoryIterator.
  • Uses the file path as a key/value pair.

Here are a few examples:

// Skipping the dots by using the SKIO_DOTS flag
$files = new FilesystemIterator('/Users/ahmad', FilesystemIterator::SKIP_DOTS);
foreach ($files as $key => $item) {
    var_dump($key); // $key is used a full path name
}

To use the filename instead of the full path as the key:

// Use | to add more flags
$files = new FilesystemIterator($dirName, FilesystemIterator::SKIP_DOTS | FilesystemIterator::KEY_AS_FILENAME);

If you dump the $item you can see that it is of type SplFileInfo and not DirectoryIterator:

foreach ($files as $item) {
    var_dump($item); // SplFileInfo
}
// Listing the full SplFileInfo methods
print_r(get_class_methods(SplFileInfo::class));

Unlike DirectoryIterator, you do not need to clone the $item while storing it into an array:

$files = [];
foreach ($dir as $key => $file) {
    $files[] = $file;
}
var_dump($files[0]->getSize());

Use FilesystemIterator instead of DirectoryIterator. It is the better choice in every scenario.

RecursiveDirectoryIterator

Use the RecursiveDirectoryIterator to get all the files and directories recursively.

The RecursiveDirectoryIterator extends the FilesystemIterator as well as implementing the RecursiveIterator interface:

$files = new RecursiveDirectoryIterator('/Users/ahmad');
foreach ($files as $item) {
    echo $item.'<br>';
}

On its own, this returns the exact same result as if you were using the FilesystemIterator.

To make it return all the files and directories recursively, you need to pass the RecursiveDirectoryIterator as a parameter to the RecursiveIteratorIterator. This is because RecursiveIteratorIterator traverses all the children recursively:

$files = new RecursiveDirectoryIterator('/Users/ahmad');
$files = new RecursiveIteratorIterator($files);
foreach ($files as $file) {
    echo $file.PHP_EOL;
}

Use the setMaxDepth() method to specify the traversing depth. The default value is zero, which means traversing all the files and directories.

LimitIterator

The LimitIterator allows iteration over a limited subset of items in an Iterator:

$files = new RecursiveDirectoryIterator('/Users/ahmad');
$files = new RecursiveIteratorIterator($files);

// Get the first 10 results
$files = new LimitIterator($files, 0, 10);
foreach ($files as $file) {
    echo $file.PHP_EOL;
}

GlobIterator

The GlobIterator uses glob patterns.

The GlobIterator extends the FilesystemIterator, which means it returns an iterator of SplFileInfo when returning its results.

Here is how to get all the pdf files that reside in a particular directory:

$files = new GlobIterator('/Users/ahmad/Library/*.pdf');
/** @var SplFileInfo $file */
foreach ($files as $file) {
    echo $file->getFilename();
}

GlobIterator only accepts an absolute path.

RegexIterator

RegexIterator is used to apply regular expressions on the file system:

// Get all "pdf" and "epub" files
$files = new FilesystemIterator('/Users/ahmad/Library');
$files = new RegexIterator($files, '/\.(?:pdf|epub)$/i');
foreach ($files as $file) {
    echo $file.PHP_EOL;
}

You can apply the RegexIterator to search for the given pattern recursively:

$files = new RecursiveDirectoryIterator('/Users/ahmad/Library');
$files = new RecursiveIteratorIterator($files);
$files = new RegexIterator($files, '/\.(?:pdf|epub)$/i');

foreach ($files as $item) {
    echo $item.PHP_EOL;
}

Summary

  • SPL file classes -- SplFileInfo, SplFileObject, and SplTempFileObject provide object-oriented file handling without relying on procedural functions.
  • DirectoryIterator vs FilesystemIterator -- FilesystemIterator is the superior choice because it returns SplFileInfo objects, supports flags, and does not require cloning items when storing them in arrays.
  • Recursive traversal -- RecursiveDirectoryIterator must be wrapped in RecursiveIteratorIterator to actually traverse nested directories.
  • Filtering iterators -- LimitIterator, GlobIterator, and RegexIterator provide powerful ways to filter filesystem results without writing manual loops.
  • SPL over third-party packages -- Use SPL before reaching for composer libraries unless the library provides functionality that is genuinely difficult to implement, such as SymfonyFinder.
Share