This last week I’ve been busy with a lot of bug fixes, stability enhancements and finishing half implemented features. One of the big problems that I had was that the function which, for all intended purposes is the core of Visual Haskell, the module discovery code was very very slow. On average it ran in 1.2secs and a stdDev of 0.1sec.
This might not seem all that bad at first glance, but consider this, every 500ms after your last key press, a call is made to this function to validate your changes. If the file had errors this would show up right away and you would be informed of type errors etc immediately, but when type checking actually succeeds, getting all the information we desire from a module takes a 1.2seconds on average as stated above. To the user this would total a 1.7second delay for any changes (at least for the average Haskell file).
This is absolutely too slow, as features get added, it’s increasingly important to get this part right. My first thought was to optimize my use of Lists. I made changes so that cons is used as much as possible instead of the concatenation operator ++ , since cons runs in constant time.
This shaved off about 0.1seconds on average. which Is not bad, but still not what I was hoping to gain. So the second thought was that the lists themselves was slow. I assumed this to be a reasonable explanation since a lot of list operations took place.
My second change was to add strictness annotation to the datatypes (they’re being send over FFI anyway, so they’ll always be evaluated at one point, might as well do it sooner rather than later) and changed from Lists to a Vector from the "vector” package on Hackage.
Unfortunately the change to Vectors has the opposite effect of what I wanted, (yes I did enable optimizations). Execution time now became on average 1.8secs and StdDev 0.3secs. Clearly not the right direction.
So I removed the Vectors and decided to run some profiling, the results were interesting and certainly not hat I was expecting:
Sun Jul 11 20:13 2010 Time and Allocation Profiling Report (Final)
Main.exe +RTS -p -RTS
total time = 1.22 secs (61 ticks @ 20 ms)
total alloc = 262,102,440 bytes (excludes profiling overheads)COST CENTRE MODULE %time %alloc
Typecheck-Rename HscMain 41.0 42.5
CAF:loadWithLogger_r8fG VsxParser 37.7 15.2
SimplTopBinds SimplCore 9.8 14.7
Parser HscMain 3.3 7.6
Simplify SimplCore 1.6 0.1
OccAnal SimplCore 1.6 3.6
bin_tycldecls BinIface 1.6 4.9
occAnalBind.scc OccurAnal 1.6 0.4
getModInfo VsxParser 1.6 3.2
CoreTidy HscMain 0.0 2.0
CorePrep HscMain 0.0 2.3
CAF Packages 0.0 1.5
According to these results only 1.6% was spend actually processing the file by my function, the rest were spend by the ghc api mostly in loading and type checking. To see how well this presents itself as we scale, I ran the getModInfo function 47 times. The results were:
Sun Jul 11 21:17 2010 Time and Allocation Profiling Report (Final)
Main.exe +RTS -p -RTS
total time = 44.22 secs (2211 ticks @ 20 ms)
total alloc = 12,456,963,764 bytes (excludes profiling overheads)
But getModInfo now takes up 14.3% of the execution time, but the rest of time is still largely spend in the ghc api.
On every call GHC has to load the module and any interfaces for any imports that module uses. This is the slowest part, which is also I/O bound. So what if we can prevent this.
Session information is stored in a type called HscEnv , this can be read and set using the methods setSession and getSession respectively. By restoring this session every time we can prevent GHC from reloading things it doesn’t need to. Since we already know that in between calls only the head modules could have changed we can safely do this.
The next question is then, how do we preserve this value in between calls. Since the method always terminates and we want to be able to keep track of multiple HscEnv (one for every document currently open in the IDE)
This is where StablePtr comes in (thanks to lispy from #Haskell for the suggestion), it offers us an opaque reference to a Haskell value and since it has a Storable instance I can pass it to and from C#. The ideal solution.
There was only one problem, the api for StablePtr provided no mechanism to update, but there was one to dereference it. The solution is to store a mutable value inside the StablePtr then, in this case an IORef.
Take a look at this example:
module Main where
import Foreign.StablePtr
import Data.IOReftype Context a = StablePtr (IORef a)
type Env = Context [Int]initEnv :: IO Env
initEnv = newStablePtr =<< newIORef []freeEnv :: Env -> IO ()
freeEnv = freeStablePtradd :: Env -> Int -> IO ()
add env val
= do env_value <- deRefStablePtr env
modifyIORef env_value ((val:) $!)
sum’ :: Env -> IO Int
sum’ env
= do env_value <- deRefStablePtr env
current <- readIORef env_value
return $ sum current
main :: IO Int
main
= do env <- initEnv
add env 1
add env 1
add env 1
add env 1
val <- sum’ env
freeEnv env
return val
This shows how we can pass a simple state (Env) around during different invocations. The idea is that, the calls to InitEnv, add, sum’ and freeEnv are done over FFI, which is why it’s important to have the value Storable and stable (as in the GC won’t touch it).
Using this same approach we can write a helper class for the GHC api:
module GhcSession where
import HscTypes
import Data.IORef
import Foreign.StablePtrimport GHC
import GHC.Paths ( libdir )type Context = StablePtr (IORef HscEnv)
createContext :: IO Context
createContext =
runGhc (Just libdir) $
do env <- getSession
liftIO $ newStablePtr =<< newIORef env
freeContext :: Context -> IO ()
freeContext = freeStablePtrupdateContext :: Context -> HscEnv -> IO ()
updateContext env val
= do env_value <- deRefStablePtr env
writeIORef env_value $! val
— modifyIORef env_value ((const val) $!)
readContext :: Context -> IO HscEnv
readContext = (readIORef =<<) . deRefStablePtr
pushSession :: GhcMonad m => Context -> m ()
pushSession = (setSession =<<) . liftIO . readContextpullSession :: GhcMonad m => Context -> m ()
pullSession = (getSession >>=) . (liftIO .) . updateContext
The only thing left to do now is this call push- and pullSession in the getModInfo function to set and update the state. Having done so it was time to run benchmarking again. The results were impressive, a speedup of over 500%.
Mon Jul 12 01:57 2010 Time and Allocation Profiling Report (Final)
Main.exe +RTS -p -RTS
total time = 8.54 secs (427 ticks @ 20 ms)
total alloc = 2,128,219,560 bytes (excludes profiling overheads)
This means that any one invocation should run in ~0.18s which should be pretty much unnoticeable to an end user. The added bonus is also less memory usage, since the biggest structure is pretty much reused. Because of the timestamps, even if other modules changed in between two calls, only the changed ones will get reloaded.