Dependently-typed, fine-grained effect types for safe parallelism
Pure functions are parallelisable. That is, two pure functions may be run in parallel or in sequence, and yield the same result (provided that there is no data dependency between the two).
Impure functions do not have this property. As they contain side effects, evaluation order matters. When attempting to parallelise, parallel sequences of effects may have interrelations which result in different semantics when run in parallel or in sequence.
Yet, we can intuit that some effects are safe to parallelise. Take for example, the following haskell functions:
write1 :: IO ()
write1 = writeFile “/tmp/file1″ “a”
write2 :: IO ()
write2 = writeFile “/tmp/file2″ “b”
both :: IO ()
both = write1 >> write2
“both” writes the two files in sequence… however, writing them in parallel would give the same output. Contrast to the following:
write1 :: IO ()
write1 = writeFile “/tmp/file1″ “a”
write2 :: IO ()
write2 = writeFile “/tmp/file1″ “b”
both :: IO ()
both = write1 >> write2
As ‘write1′ and ‘write2′ are both writing to the same file, order of execution matters, and it is not safe to parallelise.
So, one set of functions is safe to parallelise, but the other is not. If the programmer wants to take advantage of parallelism, it is up to them to guarantee the safety of it. This is risky. It would be better if there was a static guarantee, checked by the compiler. Let’s encode this information in the type system.
All of the above functions are of type IO (). This is a very broad type – any effect can be performed within IO. We need more specific types, for specific effects.
Firstly, we can identify that the writeFile function performs file IO. Let’s consider an effect type “FileIO”. So, writeFile’s type would become “FilePath -> String -> FileIO ()”. This doesn’t solve the problem above, however, imagine if we had a similar effect type “NetworkIO”. A function of type “NetworkIO ()” could safely run in parallel with a function of type “FileIO ()”, since network and file IO will not impact each other.
To directly solve the problem above, we need to go more fine-grained. We know that (writeFile f s) will only write to file “s”. Since s is a value, we need dependent types. The type of writeFile now depends on the argument to parameter s.
If we modify Haskell syntax to include dependent types, we could denote this as:
writeFile :: (a :: FilePath) -> String -> FileIO a ()
FileIO is now a type constructor with one value parameter and one type parameter. The value parameter denotes what file the effect is allowed to access.
With this in mind, the two functions above could be written as:
write1 :: FileIO “/tmp/file1” ()
write1 = writeFile “/tmp/file1″ “a”
write2 :: FileIO “/tmp/file2” ()
write2 = writeFile “/tmp/file2″ “b”
The paths are duplicated between the functions’ type signatures and declaration. A new syntax could allow refactoring of this. e.g.
write1 :: a = “/tmp/file1” => FileIO a ()
write1 = writeFile a “a”
write2 :: a = “/tmp/file2” => FileIO a ()
write2 = writeFile a “b”
We now have enough information in the types to know that the two operations are independent, and can be safely parallelised.
So, what would the type of “both” be? “both” is a function that takes no arguments, and writes to files “/tmp/file1” and “/tmp/file2”. Below are a few possible approaches, and the resulting type signatures.
- Create a type FileIOs that has a value parameter of type [FilePath].
write1 :: FileIO [“/tmp/file1”] ()
write2 :: FileIO [“/tmp/file2”] ()
both :: FileIOs [“/tmp/file1”, “/tmp/file2”] () - Just use a FileIO type that takes a list of files.
write1 :: FileIO [“/tmp/file1”] ()
write1 :: FileIO [“/tmp/file2”] ()
both :: FileIO [“/tmp/file1”, “/tmp/file2”] () - Introduce a scheme of conversions between types. Possible options include:
- An implicit conversion. “FileIO child” would automatically widen to “FileIO ancestor”.
- An explicit conversion to a Maybe type:
widen :: (y :: FilePath) -> FileIO x -> Maybe (FileIO y) - An explicit conversion to a common type. e.g.
extend :: (z = commonAncestor x y) => FileIO x -> FileIO y -> FileIO z - Automatic widening within the effect tracking system.
- Any function with parameters of type FileIO x and FileIO y has type FileIO “/”
There are obviously some challenges in implementing these schemes:
- Some of the information we want to put in the type system may not be available until runtime – e.g. what is the common parent of two files x and y?
- What happens in the case of symbolic links and mounted drives?
These would require further research.
Conclusion
The underlying concept is that of gaining leverage by preserving information. The implementation of “writeFile” has information that denotes that it can only write to one file, however, we throw this information away when we widen to IO. By preserving it and encoding it in the type system, we acquire useful static information about our programs. This can give effectful code some of the priviledges that only pure code currently enjoys, such as parallelism.
I suspect that it would also improve our ability to verify effectful code – a problem I described in a previous post. If I know that a function can only perform file IO in a certain folder, I can test by checking for evidence of filesystem changes in that folder. I know that it didn’t write anywhere else, and I know that it didn’t print to the screen or open network sockets.