Configurable Candy


Candy is a feature where you replace a selection of text with something else (usually also text), however this is done in view only and so not in the actual file. This is useful to replace things like “->” with an actual Unicode arrow while still allowing other text editors that can’t handle Unicode to display the file correctly.

Leksah implements this and allows you to configure it via a so called “Candy” file, So I “borrowed” their approach and extended it to suit my needs.

The general syntax of a Visual Haskell candy file (.vshc) is

-- "<token>" <unicode> <modifier> <enabled> <FIT|NONE>
-- the token has to be quoted
-- the supported modifiers are
-- CODE    - Apply only to regions of code
-- COMMENT - Apply only inside comments
-- STRING  - Apply only in string literals
-- ALL     - Apply to all

the modifiers are self explanatory but the FIT or NONE modifiers take some explaining.

When using the FIT modifier, the Candy engine won’t try to keep the same width as the text it’s replacing. This means that you get a layout change. The actual file might have “alpha” but the view will show only “a”.

With some things, especially with keywords we don’t want this, this is where the NONE modifier comes in. When this is used the engine will always match the width of the text it’s replacing by making the Unicode text larger and adding horizontal whitespace. This means that “alpha” would be rendered as “  a  “ and so preserving the layout.

A shot of this in action can  is:

image

For reference, the full default candy file that will be shipping with VSH2010 is:

— Candy file

— Format

— "<token>" <unicode> <modifier> <enabled> <FIT|NONE>

— the token has to be quoted

— the supported modifiers are

— CODE    – Apply only to regions of code

— COMMENT – Apply only inside comments

— STRING  – Apply only in string literals

— ALL     – Apply to all

— Note that the replacement block will always take up the exact same

— space as the tokens it’s replacing. e.g. "alpha" will be replaced by "  a  "

"->"         0x2192    CODE       True       NONE     –RIGHTWARDS ARROW

"<-"         0x2190    CODE       True       NONE     –LEFTWARDS ARROW

"=>"        0x21d2    CODE       True       NONE     –RIGHTWARDS DOUBLE ARROW

">="        0x2265    CODE       False      NONE     –GREATER-THAN OR EQUAL TO

"<="        0x2264    CODE       False      NONE     –LESS-THAN OR EQUAL TO

"/="          0x2260    CODE       False      NONE     –NOT EQUAL TO

"&&"        0x2227    CODE       False      NONE     –LOGICAL AND

"||"           0x2228    CODE       False      NONE     –LOGICAL OR

"++"        0x2295    CODE       False      NONE     –CIRCLED PLUS

"::"           0x2237    CODE       False      NONE     –PROPORTION

".."           0x2025    CODE       False      NONE     –TWO DOT LEADER

"^"            0x2191    COMMENT    False      NONE     –UPWARDS ARROW

"=="        0x2261    CODE       False      NONE     –IDENTICAL TO

" . "          0x2218    CODE       True       NONE     –RING OPERATOR

"\\"           0x03bb    CODE       True       NONE     –GREEK SMALL LETTER LAMBDA

"=<<"       0x291e    CODE       False      NONE     —

">>="       0x21a0    CODE       False      NONE     —

"$"           0x25ca    CODE       False      NONE     —

">>"        0x226b    CODE       False      NONE     — MUCH GREATER THEN

"forall"    0x2200    CODE       False      NONE     –FOR ALL

"exist"     0x2203    CODE       False      NONE     –THERE EXISTS

"not"       0x00ac    CODE       False      NONE     –NOT SIGN

"alpha"         0x03b1    ALL        True       FIT      –ALPHA

"beta"           0x03b2    ALL         True       FIT      –BETA

"gamma"     0x03b3    ALL        True       FIT      –GAMMA

"delta"          0x03b4    ALL        True       FIT      –DELTA

"epsilon"     0x03b5    ALL        True       FIT      –EPSILON

"zeta"           0x03b6    ALL        True       FIT      –ZETA

"eta"             0x03b7    ALL        True       FIT      –ETA

"theta"          0x03b8    ALL        True       FIT      –THETA

— Because you can configure options inside the editor itself, don’t comment out

— lines since they won’t be parsed, just change the enable flag

Advertisements

QuickInfo


Visual studio has this ability to show information about symbols when you hover over them, this feature is called “QuickInfo”

This essentially means that you can hover over a symbol like “fmap” and it would tell you, fmap :: forall a b (f :: * -> *). (Functor f) => (a -> b) -> f a  -> f b and that it’s defined in GHC.Base

in ghci this would be equivalent to typing :i fmap which would result in the following output

class Functor f where
  fmap :: (a -> b) -> f a -> f b
  …
        — Defined in GHC.Base

Whenever the user hovers over a symbol in visual studio, the IDE will call a method

public void AugmentQuickInfoSession(IQuickInfoSession session, IList<object> qiContent, out ITrackingSpan applicableToSpan)

 

I use the information given to me to construct two things

  • The word the user is hovering on
  • The exact location within the source file of that word

This information is used to find the correct Name value in the Haskell Renamed AST. The problem is we can’t construct name values, so we have to look them up. This is provided with the help of a typeclass

class Finder a where
    findName     :: MonadPlus m => a -> FastString -> Maybe SrcSpan -> m Name

The monad used determines how many results you receive. Use a Maybe monad and you’ll get just 1. use a List monad and you’ll get more than one, but only if you don’t specify a specific source span to look for (wildcard match on name alone).

However we should never enter the PostTcType types inside the renamed AST. These are invalid at this stage. Unfortunately SYB’s listify does not provide a way to tell it not to enter a specific type.

So we create a modified version of those SYB calls:

data Guard where
  Guard :: Typeable a => Maybe a -> Guard
 
type HList = [Guard]

— | Summarise all nodes in top-down, left-to-right order
everythingBut :: (r -> r -> r) -> HList -> GenericQ r -> GenericQ r
everythingBut k q f x
  = foldl k (f x) fsp
    where fsp = case isPost x q of
                  True  -> []
                  False -> gmapQ (everythingBut k q f) x

isPost :: Typeable a => a -> HList -> Bool
isPost a = or . map check
where check :: Guard -> Bool
       check x = case x of
                   Guard y -> isJust $ (cast a) `asTypeOf` y

— | Get a list of all entities that meet a predicate
listifyBut :: Typeable r => (r -> Bool) -> HList -> GenericQ [r]
listifyBut p q
  = everythingBut (++) q ([] `mkQ` (\x -> if p x then [x] else []))

Now listify takes a HList of types not to inspect. HList is a Heterogeneous list, so it’ll allow things of different types inside it. Finding the Name is now as simple as:

instance Finder (HsGroup Name) where
    findName grp a b = findName (listifyBut (isName a b) [Guard (undefined :: Maybe PostTcType)] grp) a b
 

once we have the names, we can just call getInfo. Nothing else is needed because remember that all API calls have a Context as argument, for instance the full type of the tooltip function is:

— @@ Export
— | Gather information about the identifier you requested
–   .
–   Context: The session for this call, Serves as a cache
–   .
–   String : The name of the identifier to lookup
–   .
–   SrcSpan: The location of the identifier in the sourcefile
–   .
–   Bool   : Whether to treat this call as a strict one. If it’s strict
–            Then the name AND span must match. If it’s not, Any match will do
–   .    
getTooltip :: Context -> String -> SrcSpan -> Bool -> IO (Maybe String)

This produces the following result

image

There’s a problem however, if you hover over a variable name that’s defined in the body of the function it produces a runtime panic:

image

If you think about it, this kind of makes sense, GHCi also won’t produce anything on local variables. In fact you can’t even refer to them. But we would at the least we would like to prevent this crash, and in the best case scenario we would like *some* information on the symbol.

After poking around some I noticed that the type of the identifiers that produce the errors are “Internal Name” values. the function nameModule then fails on these types. The plan now is, whenever we find a Internal Name, we look into the TypecheckedModule to find the Id associated with the Name value we retrieved earlier. with SYB this is again easy. However there’s a catch (thanks to nominolo for pointing this out): we should not enter any PostTcKind nor NameSet because these are blank after type checking.

findID :: Data a => a -> Name -> [Id]
findID a n = listifyBut ((n==) . getName) [Guard (undefined :: Maybe NameSet)
                                                                       ,Guard (undefined :: Maybe PostTcKind)] a

and that’s all. The end result is that this now works on local variables as well. Hovering over for instance the variable file generates

image

The important thing to note here is the Context , it’ll contain a cache of information. So looking up any of this stuff will be instantaneous. You just hover and directly get back information.

A last cool but *I’m not so sure how useful* function is that if you select something, then hover over it, it will type check only that expression.

image

so if you have an expression "fmap foo” somewhere but don’t remember what type foo or fmap is, just select them and hover over the selection. (although this is somewhat limited, all identifiers have to be top-level. It can’t return anything for local variables. sorry Sad smile )

And that’s it for this post, I’ll continue the work on Cabal now, or continue this track and fully finish intellisense.

No video for this in action, since I have a cold Confused smile

A new video


This is just an intermediate update, showing the near final UI. Cabal support is what I’m working on now and that is about 20% finished. I hope to get that done in a week or so barring any more difficulties.

If I do get cabal support I can put out a beta (without intellisense) just so I can get some feedback.

the video link is http://screencast.com/t/ZTBhMmUxNz

Pardon the background noise, I’m currently in a lot of wind.

screenshot

Added a screenshot for good measure, Click on the image for fullscreen.

Ghost typing


This is the preliminary version of the Ghost typing addition to visual Haskell, the idea is that whenever an explicit type signature is not given, the  IDE will display the type inferred by GHC.

You can then click on the signature to insert it, or use the smart action associated with the name of the function.

Up next is the feature that when you have given a signature that doesn’t type check, the IDE will remove that signature and retry, if it succeeds the IDE will display a suggested signature.

Below is a GIF of how the first part works.

ghostyping

Oh and collapsible regions has been finished as well Smile 

if the function has a type signature it will collapse at the end of that declaration, if not it’ll collapse at the end of the function name.

There is a restriction to this however since GHC allows you to declare your signatures anywhere in the file. In order for the signature to be considered part of the function by collapsible regions it has to be end on the line before the binding.

Which means it can span multiple lines, the end just has to be before the binding, that way it also supports haddock documented type signatures.

Collapsible regions.–Video Update #2


So here’s the second video update, it shows some of the current progress up till now, while it might not look significantly different but there’s a big difference under the hood. A lot has been rewritten and optimized in anticipations of new features coming soon, like intellisense and ghost typing (coming in the next video, due in a few days). Click the link below to see the video

Collapsible Regions

That’s it for today, and now… back to my thesis project.

Another screen capture


Just a quick snapshot of what I’ve been working on, Still doing work on typchecking and code discovery which is a bit slow atm, and having to finish this is taking longer than I expected.

errors

But then it’s off to cabal support after this.

File vs In-Memory buffer type checking


I’ve recently been looking into how to make the type checking part faster. The type checker part consists of two parts just like most of the code in Visual Haskell 2010, A Haskell side and a C# side.

This post is about optimizations on the C# side.

the setup currently is as follows:

Every 500ms after the user has finished typing, a call is made to typecheck() which does a few  things,

  • Finds the current module information (cached)
  • Generates a temporary file which has the same content of the In-Memory buffer
  • Makes a call to the Haskell world
  • Interprets the results

The part I thought I could optimize is that I didn’t want to have to write the buffer out to a file only to have GHC read it back in. Under normal circumstances the files will probably be in disk cache and not even written out to begin with, but we would still have to make a couple of kernel calls (about 6).

It’s not like the new approach wouldn’t have drawbacks, for one thing you’d have the serialization drawback, you suddenly have to serialize large strings from and to the Haskell world, but above that the GHC Api doesn’t even support type checking of In-Memory buffers.

So the first part is to edit the GHC Api and add this functionality.

Modifying GHC

I won’t explain this in detail, but I’ll just outline the “gist” of it.

Whenever you want to tell GHC want to inspect, It does so by inspecting all the “Targets” you’ve specified. An example of this would be

target <- guessTarget file Nothing
addTarget target

which means, we first ask it to guess what the string is (a module, or filename) and it’ll construct the appropriate Target for us. In order to support In-Memory buffers I added a new TargetId

| TargetVirtualFile StringBuffer FilePath (Maybe Phase)
  — ^ This entry loads the data from the provided buffer but acts like
  –   it was loaded from a file. The path is then used to inform GHC where
  –   to look for things like Interface files.
  –   This Target is useful when you want to prevent a client to write it’s
  –   in-memory buffer out to a file just to have GHC read it back in.

This takes the string buffer, and the original filename, the filename is only used to discover properties about the buffer (the most important of which is what type of input it is, hspp, hsc, hs or lhs)  and along with this a new function was create the facilitate the create of a virtual target.

— | Create a new virtual Target that will be used instead of reading the module from disk.
–   However object generation will be set to False. Since the File on disk would could
–   and is most likely not the same as the content passed to the function
virtualTarget :: GhcMonad m => StringBuffer -> FilePath -> m Target
virtualTarget content file
   = return (Target (TargetVirtualFile content file Nothing) False Nothing)

so for it’s all pretty straight forward, the rest I won’t explain since they have no bearing on how to use this new code, but what I’ve done was basically trace the workings of the load2 function, and where appropriate added some new code to instead of reading a file into a buffer, to just use the buffer passed on to virtualTarget.

The problem is, in runPhase a system process is called to deLit a Literate Haskell file which is a bit of a problem since it means that I would have to have the file on disk anyway. I decided not to worry about that for now but instead to focus on implementing the change for normal Haskell files first so I can run some benchmarking code.

Testing

Now for the actual testing

I created a new DLL file which contains the compiled code from two Haskell functions

getModInfo :: Bool -> String -> String -> String -> IO (ApiResults ModuleInfo)

and

getModStringInfo :: Bool -> String -> String -> String -> String -> IO (ApiResults ModuleInfo)

They both do the same amount of work, only the latter passes along as a target a VirtualTarget instead of a normal File target. But first it’s time to see if the changes actually did something.

This first test just asks for all the information it can get from a file named “VsxParser.hs”

Phyx@PHYX-PC /c/Users/Phyx/Documents/VisualHaskell2010/Parser
$ ghcii.sh VsxParser
GHCi, version 6.13.20100521: http://www.haskell.org/ghc/
Ok, modules loaded: VsxParser.
Prelude VsxParser> getModInfo True "VsxParser.hs" "VsxParser" ""
Success: ModuleInfo {functions = (70,[FunType {_Tspan = VsxParser.hs:256:1-8, _Tname = "showPpr’", _type = Just "Bool -> a -> String", _inline = (0,[])},FunType {_Tspan = VsxParser.hs:252:1-7, _Tname = "mkSized", _type = Just "[a] -> (Int,[a])", _inline = (0,[])},FunType {_Tspan = VsxParser.hs:244:1-17, _Tname = "configureDynFlags", _type = Just "DynFlags -> DynFlags", _inline = (0,[])},FunType {_Tspan = VsxParser.hs:237:1-19, _Tname = "createLocatedErrors", _type = Just "ErrMsg -> [ErrorMessage]", _inline = (0,[])},FunType {_Tspan = VsxParser.hs:230:1-13, _Tname = "processErrors", _type = Just "SourceError -> IO (ApiResults a)", _inline = (0,[])},FunType {_Tspan = VsxParser.hs:69:1-4, _Tname = "main", _type =
(content goes on and on and on)

the content will go on for many more pages so I won’t post that

now, We call getModStringInfo asking for information about the same file, but this time we pass along a much smaller and completely different buffer than the one of the file on disk. namely a module with just 1 function “foo = 5”

Prelude VsxParser> getModStringInfo True "module VsxParser where\nfoo = 5\n" "VsxParser.hs" "VsxParser" ""
Success: ModuleInfo {functions = (1,[FunType {_Tspan = VsxParser.hs:2:1-3, _Tname = "foo", _type = Just "Integer", _inline = (0,[])}]), imports = (1,[Import {_Ispan = Implicit import declaration, _Iname = "Prelude", _pkg = Nothing, _source= False, _qual = False, _as = Nothing, _hiding = (0,[])}]), types = (0,[]), warnings = (0,[])}

As can be seen it used the content of the buffer and not the file to perform the analysis. So far so good. The new virtualTarget seems to be working just fine. Now comes the part you’ve all been waiting for, the numbers!

Benchmarking

First the setup, The two methods above as already mentioned were compiled to a static DLL. The tests are written in C# and the full code for it is as follows (you can skip this if it doesn’t interest you). Also my laptop harddrive is just 5400rpm so please keep this in mind 🙂

 static void Main(string[] args)
        {
            Console.WriteLine("Running Benchmarks....");

            int length = 500;
            List<Double> stringB = new List<double>();
            List<Double> fileB = new List<double>();
            string content = File.ReadAllText("VsxParser.hs");

            Console.Write("Press [enter] to start In-Memory buffer test");
            Console.ReadLine();

            Console.WriteLine("Running String tests...");
            unsafe
            {
                for (int i = 0; i < length; i++)
                {
                    DateTime start = DateTime.Now;
                    WinDll.Parser.getModInfoByString(true, content, "VsxParser.hs", "VsxParser", "");
                    stringB.Add(DateTime.Now.Subtract(start).TotalMilliseconds);
                }
            }

            Console.Write("Press [enter] to start File based test");
            Console.ReadLine();

            Console.WriteLine("Running File tests...");
            unsafe
            {
                for (int i = 0; i < length; i++)
                {
                    DateTime start = DateTime.Now;
                    string path = Path.ChangeExtension(System.IO.Path.GetTempFileName(), Path.GetExtension("VsxParser.hs"));
                    File.WriteAllText(path, content);
                    WinDll.Parser.getModuleInfoByPath(true, "VsxParser.hs", "VsxParser", "");
                    File.Delete(path);
                    fileB.Add(DateTime.Now.Subtract(start).TotalMilliseconds);
                }
            }
            Console.WriteLine("Writing results...");

            StreamWriter fs1 = File.CreateText("stringB.csv");
            for (int i = 0; i < length; i++)
            {
                fs1.Write(stringB[i].ToString());
                if (i < length - 1)
                    fs1.Write(", ");
            }
            fs1.Write(fs1.NewLine);
            fs1.Close();

            fs1 = File.CreateText("fileB.csv");
            for (int i = 0; i < length; i++)
            {
                fs1.Write(fileB[i].ToString());
                if (i < length - 1)
                    fs1.Write(", ");
            }
            fs1.Write(fs1.NewLine);
            fs1.Close();

            Console.WriteLine("Done.");
        } 

 

Aside from this, I’ve also used the Windows performance monitor to monitor haddrive activity. Both methods are ran 500times and all the times are measured in the milliseconds range.

Results

first off, the intuition was right, Windows observed a 100% disk cache hit for the duration of the test. So the files were always in cache. So the expected difference shouldn’t be that big (or should it?)

  String buffer File Buffer
Overview Results_Normal_s1 Results_Normal_s2
Performance clustering Results_Normal_b1 Results_Normal_b2
Geometric mean 636.594594942071087ms 666.617770612978724ms
Variance 1436.0ms 1336.56ms
Mean Deviation 26.6618ms 27.1434ms

The two charts reveal that they both performed in about the same ranges on average with no real noticeable difference. You have to remember that these numbers are in milliseconds (ms). The difference in Geometric means is almost negligible 30.02317567090764ms

Results_Normal_merge

Viewing both charts together we see a significant overlap which means on average the one doesn’t perform all that better from the other on normal disk usage.

But what happens when we have abnormal disk usage? What if the drive was so busy that the disk cache starts missing. Would the File based approach still perform “good enough” ?

To simulate high disk usage I used a Microsoft tool called SQLIO, it’ official use is to find disk performance capacity, but it does so by stressing the drive, so it works fine for us. (get it at http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=9a8b005b-84e4-4f24-8d65-cb53442d9e19)

This is ran from a batch file in two modes read and write for about 36sec per mode. which should cover the length of 1 test. The batch file used is

sqlio -kW -s10 -frandom -o8 -b8 -LS -Fparam.txt
sqlio -kW -s36 -frandom -o8 -b64 -LS -Fparam.txt
sqlio -kW -s36 -frandom -o8 -b128 -LS -Fparam.txt
sqlio -kW -s36 -frandom -o8 -b256 -LS -Fparam.txt

sqlio -kW -s36 -fsequential -o8 -b8 -LS -Fparam.txt
sqlio -kW -s36 -fsequential -o8 -b64 -LS -Fparam.txt
sqlio -kW -s36 -fsequential -o8 -b128 -LS -Fparam.txt
sqlio -kW -s36 -fsequential -o8 -b256 -LS -Fparam.txt

sqlio -kR -s36 -frandom -o8 -b8 -LS -Fparam.txt
sqlio -kR -s36 -frandom -o8 -b64 -LS -Fparam.txt
sqlio -kR -s36 -frandom -o8 -b128 -LS -Fparam.txt
sqlio -kR -s36 -frandom -o8 -b256 -LS -Fparam.txt

sqlio -kR -s36 -fsequential -o8 -b8 -LS -Fparam.txt
sqlio -kR -s36 -fsequential -o8 -b64 -LS -Fparam.txt
sqlio -kR -s36 -fsequential -o8 -b128 -LS -Fparam.txt
sqlio -kR -s36 -fsequential -o8 -b256 -LS -Fparam.txt

Running SQLIO and the benchmarks again the first thing that catches my eyes is that

disk cache performance has dropped from 100% to

95.4854

Looking at the dataset I see a lot of completely missed caches. Also remember that the SQLIO is also reading data,

so the disk cache activity also represents it’s reads and not just those of the benchmark as before. So let’s analyze the numbers like before

  String buffer File Buffer
Overview Results_Normal_s1x Results_Normal_s2x
Performance clustering Results_Normal_b1x Results_Normal_b2x
Geometric mean 883.339857882210200ms 904.505721026752581ms
Variance 2.26107*10^6ms 3.81543*10^6ms
Mean Deviation 449.206ms 611.77ms

These results are quite surprising since the difference in mean now is just 21.16586314454238ms. So under heavy load they start to converge to the same performance level.

Results_Normal_mergex

So while both have some significant outliers,  both perform on average in the same category. It’s sad and hard to admit, but Unless I made a mistake when implementing the VirtualFile (which very well might be true) having a String vs File buffer doesn’t really make a big difference mostly due to the OS’s management of I/O and the disk caching going on.

In fact under heavy loads Windows started to use more and more “Fast Reads” and “Fast Writes” These require much less kernel and user mode calls than a full I/O call, so even that advantage was negated somewhat under heavy load, while the overhead of marshalling the string hasn’t changed, which might explain the smaller difference in Geometric mean in the second benchmark.

So the conclusion of the story is is, for now, there’s nothing to gain from the calls on the C# side, but maybe there is in the processing of the result but that’s saved for another time :). Up next, optimizing the Haskell Code to gather more complete module information and do it faster!.