While implementing coloring for Haskell files I noticed that lines with more closing braces (either ‘)’ or ‘}’) were not being colored.
After doing some digging around I found out the following:
?parseLine("{")->tag
cStatelessParseResultSOk
?parseLine("}")->tag
cStatelessParseResultSFailed
and
?parseLine("{-")->tag
cStatelessParseResultSOk
?parseLine("-}")->tag
cStatelessParseResultSFailed
So apparently they were throwing lexical errors, but why?
After contacting Mr. Simon Marlow I was told that this is the handling of GHC’s layout rule. to quote
“You’re probably encountering the lexer’s handling of the Haskell “layout” rule. When the lexer sees a ‘}’ token, it pops the current layout stack, and if the layout stack is empty then this is a lexical error.”
This left me with 3 choices
- Use a custom lexer much like the original visual haskell did
- Replace all {, },( and ) with 1 whitespace character so that they won’t be colored, but the rest of the input will, but the positions would be preserved.
- left pad the input with enough opening braces to have the lexer succeed in parsing then adjust the ranges.
Option 1 was the least maintainable, since I would have to keep updating the lexer everytime the one in ghc changes. So I didn’t want to do this.
Option 2 was a possibility, one which I tried out before, But I noticed that having the braces colored really did help.
Option 3 was then chosen by process of elimination. It turned out to not be that much work at all.
private int prepareLine(ref string str)
{
int round= 0, brace = 0;for (int i = 0; i < str.Length; i++)
{
switch (str[i])
{
case ‘}’:
if(i==0 || !(str[i-1]==’-‘))
brace++;
break;
case ‘)’:
round++;
break;
default:
break;
}
}if(round > 0)
str = str.PadLeft(str.Length + round, ‘(‘);if (brace > 0)
str = str.PadLeft(str.Length + brace, ‘{‘);return round + brace;
}
is the full implementation. Now I know what you’re thinking, By doing this I’ll create more opening than closing braces. So a balanced line like (Int) becomes unbalanced ((Int). However this is not a problem, Since for my coloring braces carry no semantics. I don’t care what they mean (as in, when interpreted) all I care about is what they are (as in the token type).
With that in place, the only other code needed is to skip the first n number of tokens returned from the lexer, where n is the result of calling the prepareLine function.
And that’s all, Now we have perfect line coloring everywhere 🙂