Quick tips for doing IO with Ruby all fun and games

Since I always have to go back to the docs to check on most of this stuff, it might just be better to keep it all indexed here so I can just open this blog post instead of going hunting this all again.

And, guess what, this could even be helpful for other people as well, right?

Temporary directories

You might have heard about Tempfile, but did you know you can create temporary directories in Ruby and it’s built in into the standard library?

It’s a bit hidden since it isn’t part of the main Dir documentation, but it’s there, you can create temporary directories and leave the standard library delete them and it’s contents for you by using Dir.mktmpdir:

require 'tmpdir'

Dir.mktmpdir("my-prefix") do |dir|
  File.open(File.join(dir,"text.txt"), 'w') { |f| f.write("this is a test") }

Also, always set at least a prefix for your temp folders to make sure you can spot them if they aren’t deleted or if your app crashes and doesn’t remove them for some reason, at least you’ll know which code failed to execute.

This also includes the Dir.tmpdir method that gives you the path to your operating system’s temp directory.

Handling files that fit in memory? Use IO directly

If all you want is write some text to a file, Don’t use File, use IO directly:

IO.write("/path/to/file.txt", "This is my cool text I need to write! Yay!")

This opens the file at /path/to/file.txt, writes the text to it and closes the file. Can’t get much simpler than this.

And just as simple is reading a file:

contents = IO.read("/path/to/file.txt")

This reads the file contents and returns it.

Tempfiles and StringIO

Just like we have temp directories, we also have two classes that can be used as temporary file objects, Tempfile and StringIO. Deciding two use one or the other is rather simple, if your data fits in memory and you don’t care very much about paths, just being able to read and write to an IO like object, StringIO is for you, if you need path-like behavior or if your data doesn’t fit in memory, Tempfile should be the option.

Since they both are IO-like objects you can read and write to them and send them whenever the code expects to receive an IO object. The advantage is that both objects will be cleaned up by the environment once they are garbage collected (but you better play save and unlink tempfiles to avoid having too many file handles open).

In general, prefer StringIO and when you really have to use Tempfile assume it’s just like any other file and make sure you close and unlink the file as soon as you’re done with it.

Use File.join instead of manual string concatenation

While the File.join documentation declares the method as simply appending File::SEPARATOR for every item given, the actual implementation does much more than just that and your simple Array.join call won’t be the same as what’s being done there.

Whenever you need to build an actual path, remember to always use File.join:

path = File.join("Users", "mauricio", "projects", "ruby")

Accessing files relative to the current Ruby file

A common problem we see in Ruby code, specially when you’re building gems or writing tests is having to load a file that’s somewhere at your project path, but you obviously can’t set a full path for it as you want it to be usable out of your own machine, you need a relative path for it. A very simple way to do this is to use the __FILE__ special variable.

Let’s look at an example file system structure:

- root
  - lib
    - my_gem
      - operation.rb
  - config
    - items.yml

So, if you’re at operation.rb, you can access items.yml with:

my_gem_directory = File.dir(__FILE__)
File.join(my_gem_directory, "..", "..", "config", "items.yml")

This is basically saying:

So you can use the __FILE__ variable as the relative path to load files you know are available at your current file sytem.

If you’re using Ruby 2.x you can also remove the File.dirname(__FILE__) and just use __dir__, as pointed out by brianauton.

Avoid File.open without a block

One of the main advantages of using a language with closures is how simple it is to pass code around to be executed by someone else and a very common use case for this is resource management. While in some languages you have to write a huge amount of boileplate to safetly write to a file and not leak the file handle, in Ruby all you have to do is:

File.open("some-file.txt", "w") do |f|
  f.write("this is some text\n")
  f.write("and some more text")

The code above will open the file for writing, execute the block setting the actual File object at the f variable and once the code is finished it will flush and close the file handle, making sure I don’t have to care about this.

Whenever doing file operations, always use the block style for open, avoid doing stuff like:

file = File.open("some-file.ext", "w")
file.write("hey, this is bad!")
file.write("where's the exception handling code?")

While this code might look correct, the lack of exception handling would make the file handle leak and the process running this code could eventually crash with the OS complaining it had too many files open.

We could include the exception handling code and make sure it behaves just like the File.open that takes a block, but why should we? We already have a correct and simpler solution available, don’t reinvent the wheel, just use File.open with blocks and let the Ruby standard library do it’s job.

Prefer Pathname for file path and metadata operations

Pathname functions as a nicer interface to Ruby’s path operations and you’re better off getting used to it from now on whenever you need to do stuff with file names as in:

Creating it:

require 'pathname'
path = Pathname.new("README.markdown")  

Getting the file extension:

puts path.extname
 => ".markdown"

Expanding the path:

full_path = path.expand_path
 => #<Pathname:/Users/mauricio/projects/ruby/mauricio.github.com/README.markdown>

Getting the directory the file is in:

puts full_path.dirname
 => #<Pathname:/Users/mauricio/projects/ruby/mauricio.github.com>

And what’s really important here is that most of these operations will return a Pathname object instead of String so you can easily chain a sequence of calls all operating on file/directory metadata and they will all function as expected.

FileUtils probably already has what you’re looking for

If you’re trying to do something that’s not available at Pathname, File and Dir, what you’re looking for is probably defined at FileUtils.

Many of the operations you’d usually have to manually dive down into a tree of files and directories (like chowning a directory and it’s children) are already defined as single method calls at FileUtils and you should just go there, find the method and call it instead of manually writing code to recurse over the trees and calling methods.