ANNOUNCE: Hs2lib-0.5.5


What is it?
========

A preprocessor and library which allow you to create dynamic libs
from arbitrary annotated Haskell programs with one click. It also
allows you to use the generated lib in C, C++ and C# just by including
the generated header files.

At a minimum it can be considered the inverse of c2hs.

Where to get it?
============

You can get it  from Hackage
(http://hackage.haskell.org/package/Hs2lib) or by using cabal
(cabal install Hs2lib).

Documentation, Mailing List, Source, etc
=======================

Go to https://mistuke.wordpress.com/category/hs2lib/ for information.
Or for a tutorial http://www.scribd.com/doc/63918055/Hs2lib-Tutorial

What’s New?
=========

– Currently Supported:

* Autogerates free functions for any StablePtr used.
* Added support for memory leak tracing with the –debug flag
  https://mistuke.wordpress.com/2011/08/09/tracing-lexer-memory-leaks/
* Fixed a bug in the library that caused an error in marshalling
* No longer frees strings, in order to prevent heap corruptions on
  calls from c#
* Fixed an issue with lists and type synonyms
* Fixed an alignment issue with stdcall
* Renamed the hs2c# pragma to hs2cs
* Fixed an error in parsing pragmas
* Fixed a majot marshalling bug having to do with lists of pointer types
* Started on an implementation of an automatic test generator to test
  exported functions.
* Re-arranged the namespaces generated in C#. Now Functions and types are
  put in different namespaces.

– Not Currently supported:

* Infix constructors
* Control over the autogenerated free functions for StablePtr
* Exporting of type/functions with the same name in different modules
* Exporting polymorphic functions
 
Details
=====

A more detailed documentation will follow in the next few weeks.

Why is the first version 0.4.8?
————–

This project has been in development for a very long time. Nearly two
years on and off.
It was developed mainly to facilitate the creation of Visual Haskell
2010. This is just the first public release of this tool.

Advertisement

Tracing Lexer memory leaks


One of the problems I’ve been struggling with for a while now is the presence of pesky memory leaks in Visual Haskell. Hs2lib has one convention, It doesn’t free any memory, and so you’re responsible for freeing all memory.

As far as I knew, I was freeing any and all pointers that I had. It should not be leaking, but yet it was. So I decided to get to the root of this problem. I wrote a simple application that uses the Lexer classes of Visual Haskell and would emulate a user scrolling by feeding it lines of a Haskell file one at a time.

Using the Debug Diagnostics Tools I was able to track the application and make a full memory dump every few seconds in order to track the progression of the leak. The results were rather surprising:

WARNING – DebugDiag was not able to locate debug symbols for HsLexer.dll, so the reported function name(s) may not be accurate.

HsLexer.dll is responsible for 11.95 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:

HsLexer!HsEnd+1343ac4: 8.20 MBytes worth of outstanding allocations.

HsLexer!HsEnd+150ae89: 2.00 MBytes worth of outstanding allocations.

This was detected in LexerLeakTest.exe__PID__4660__Date__07_04_2011__Time_03_20_30PM__18__Leak Dump – Private Bytes.dmp

So according to this tool, my little program was leaking quite extensively and not surprisingly, it was all coming from my inside my Haskell program. Unfortunately, GHC/GCC can not produce the proper symbols (.pdb files) for any of the Microsoft debugging tools to understand. So while we know conclusively that the program is leaking, we don’t know where.

Hs2lib-Debug

This is where the new version of Hs2lib comes into play. The idea is to track any and all allocations and de-allocations made during the run of the program, in essence a simple profiler.

For this reason we now have Debug versions of the of modules. Through the magic of CPPHS, Cabal and custom preprocessors we get the usual “release” modules and “debug” modules which write the allocations to a file.

I’ll skip over the implementation of that, but the idea is to override all allocation functions with a custom one. The structure which is being written out to disk (the current version uses the rather slow show/read instances generated by GHC. I will be replacing these in the future) is:

<br />
data MemAlloc = MemAlloc { memFun   :: Caller<br />
                         , memStack :: Stack<br />
                         , memStart :: Address<br />
                         , memStop  :: Maybe Address<br />
                         , memSize  :: Maybe MemSize<br />
                         , memTime  :: String<br />
                         }<br />
              | MemFree { memStack :: Stack<br />
                        , memStart :: Address<br />
                        , memSize  :: Maybe MemSize<br />
                        , memTime  :: String<br />
                        }<br />
              deriving (Show, Read)<br />

The allocations get written out to a file called “Memory.dump”.

A piece of one such file is:

<br />
MemAlloc { memFun = Record<br />
         , memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:242(peekCWString)"<br />
                     (Then "HsLexer.hs:85(fromNative)"<br />
                     (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                      Empty))<br />
         , memStart = 43309024<br />
         , memStop = Just 43309026<br />
         , memSize = Just 2<br />
         , memTime = "1312779866"<br />
         }<br />
MemFree { memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:246(freeCWString)"<br />
                    (Then "HsLexer.hs:85(fromNative)"<br />
                    (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                     Empty))<br />
        , memStart = 43309024<br />
        , memSize = Just 2<br />
        , memTime = "1312779866"<br />
        }<br />

From this you can see that not only does it record allocation, but I’ve also implemented a simple artificial stack, that shows us where the allocation is/originated. The current implementation is rather simplistic. I will look into expanding this later, but for now it suits my needs.

For those wondering, this tracker is not enabled by default. To enable it just pass the “- -debug” flag to the call to Hs2lib. Compiling with this flag instructs the library to use the Debug version of the libraries, and changes the code generators so that they also add extra information to the generated code.

Since de-allocations also have to be tracked, using this flag also exposes a free function which should be used to free pointers. If the CallingConvention  used is stdcall then a function “freeS” is exported, if ccall then “freeC” is exported. The reason for this is because these functions are statically defined. They aren’t being generated by the tool but are instead part of the library part of the tool.

Performing analysis

Once we have a .dump file, the next step is to analyze this information. This is where the new tool Hs2lib-debug comes in. This tool replays the allocations in a pseudo heap. If all goes well, at the end the heap should be empty. If it has any entries it means we’ have a leak.

Invoking it is quite easy, just pass it as an argument the folder which contains the dump file:

Hs2lib-debug.exe -v .\MemDumps

and that’s all.

Running this on the dump file created from the lexer program returned:

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 31890 record(s).

*** Analyzing [********************] 100.00%

*** Found 1135 outstanding allocation(s).

1135 unfreed references found originating from HsLexer.hs:85(fromNative)\;\;\WinDll\Lib\/NativeMapping_Base.cpphs:242(peekCWString)\;

*** Cleaning up….

Done.

The messy output for the stack is a as follows: Entries in the path are separated by a ;, instead of the usual \ character.

The function the profiler pointed out is:

<br />
lexSourceStringWithExtA :: StablePtr (IORef [ExtensionFlag]) -&gt; CWString -&gt; IO ((StatelessParseResultPtr (Located Token)))<br />
lexSourceStringWithExtA a1 a2 =<br />
  do let st = newStack (__FILE__ ++ ":" ++ (show __LINE__) ++ "(lexSourceStringWithExtA)")<br />
     a1p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a1<br />
     a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />
     res &lt;- lexSourceStringWithExt a1p a2p<br />
     toNative st res<br />

And the exact line is

<br />
a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />

This is interesting for a couple of reasons., The profiler is saying that the pointer associated with the CWString (which is defined as a Ptr CWchar) is never freed. But why not.

The answer lies in the C# Marshaller and types being used. We are currently Marshalling C# strings using the String datatype. Strings in C# are immutable, so once the marshaller creates a wchar_t* from the string, it never worries about it again. They are strictly an in parameter.

There are two ways to solve this:

  • C# does have mutable strings, using the StringBuffer type. Using StringBuffer has the benefit that it already is implemented as a char pointer. The marshaller simply passes the pointer to the Haskell function and upon. After the function returns and the GC determines that the StringBuffer is no longer in use, it should free the memory (at least in theory).
  • Make the library just free any CWString it dereferences.

For now I’ve chosen the second approach, for no other reason other than it requires the least amount of change in existing code. In the future I’ll adopt the approach that Hs2lib will free any arguments being passed to it. I don’t know if this is the convention usually used. If someone has something against this approach I would love to hear it.

Update : There’s somewhat of a big gotcha that I’ve recently discovered. We have to remember that the String type in .NET is immutable. So when the marshaller sends it to out Haskell code, the CWString we get there is a copy of the original. We have to free this. When GC is performed in C# it won’t affect the the CWString, which is a copy.

The problem however is that when we free it in the Haskell code we can’t use freeCWString. The pointer was not allocated with C (msvcrt.dll)’s alloc. There are three ways (that I know of) to solve this.

  • use char* in your C# code instead of String when calling a Haskell function. You then have the pointer to free when you call returns, or initialize the pointer using fixed.
  • import CoTaskMemFree in Haskell and free the pointer in Haskell
  • use StringBuilder instead of String. I’m not entirely sure about this one, but the idea is that since StringBuilder is implemented as a native pointer, the Marshaller just passes this pointer to your Haskell code (which can also update it btw). When GC is performed after the call returns, the StringBuilder should be freed.

I’ve once again updated the library to not free any pointers. This prevents a nasty heap corruption. To not get memory leaks in c# you are free to choose between solution 1 & 3 presented here.

Results

With this new code in place, I once again run the profiled lexer to generate a dump. This time when I analyze the result however I get

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 33027 record(s).

*** Analyzing [********************] 100.00%

*** Found 0 outstanding allocation(s).

Congratulations, No memory leak(s) detected.

*** Cleaning up….

And so the memory leaks are fixed. It’s worth noting that the analyzers can be quite slow. It uses a flat LinkedList like structure and lists to do the analysis. I will in future versions be replacing these with a Tree like structure and arrays respectively.

Hs2lib: Automating compilation of dynamic libs


As most of you know Visual Haskell makes use of Haskell programs compiled to a dynamic library (dll).  This task can be completely automated. This is what Hs2lib provides. Hs2lib (formerly WinDll) was made to facilitate making changes and updates to Visual Haskell’s support files. It was designed for purely that purpose and thus has some limitations and design decisions that reflect this. I’ve been working mostly on getting this to a stable state the last few weeks/months. The one used in the original Visual Haskell used a hack for some things like lists. This one has a stable interface.

get it from Hackage with

cabal install hs2lib

using it is quite simple, I’ll illustrate with an example:

module Arith where

-- @@ Export
summerize :: [Int] -> [Int] -> Int
summerize x y = sum (x ++ y)

-- @@ Export
single :: Int -> [Int]
single x = [1..x]

Hs2lib does not export all functions automatically. you have to mark them with a special annotation “--@@ Export [=optional_name]”

if no name is specified, the function name is used. As a note, only functions with an explicit type signature can be exported. This tool works by static analysis. To compile a simple invocation to hs2lib is sufficient

PS C:\Examples> hs2lib Arith
Linking main.exe …
Done.

After this there should be 2 header files and one dll file named Hs2lib.dll in the folder. This name can be changed with the –n flag. Your Haddock documentation is carried over along with the original function type as comments for the new generated functions in the headers. Marshaling data is automatically generated for you for any data structures found during the dependency analysis phase. This can only be done if the source for the data type is available.

An example of how to call this function via C is

#include <stdio.h>
#include <stdlib.h>
#include "Hs2lib_FFI.h"

int main(int argc, char** argv) {

    HsStart();

    int* foo = (int*)malloc(sizeof(int)*3);
    foo[0] = 2;
    foo[1] = 5;
    foo[2] = 10;

    printf("Sum result: %d\n", summerize(3, foo, 3, foo));

    free(foo);

    int count;
    int* bar = single(8, &count);

    printf("Single result: (%d)\n", count);

    int i;
    for(i = 0; i < count; i++)
        printf("\t%d\n", bar[i]);

    free(bar);

    HsEnd();

    return (EXIT_SUCCESS);
}

Since Visual Haskell is written in C# this tool can also output C# code, this is done by using the –c# flag. I’m working on a PDF that goes into much greater detail about this tool and it’s options, But I need to reword some parts of it as I’ve been told it’s too “handholdy” (simplistic) in some parts. So I’ll hold off on publishing that.

But here are the Highlight of this tool

Currently Supported:

  • Generates Marshalling information for abitrary datatypes
  • Types with kind other then * have to be fully applied. (e.g. Maybe String). A specialized Marshaling instance is then generated for each of these instances.
    • Generates FFI exports for any function you mark, with FFI compatible type
    • Properly supports marshaling of lists. [Int], where the list is an argument becomes CInt –> Ptr Int , getting an explicit size for the list, and where a return value becomes Ptr CInt –> Ptr Int, where it expects a pointer to which to write the size of the list. This introduces a semantic difference between [Char] and String. The former is treated as a list of characters with no explicit terminator, whereas the latter it treated as a Null terminated wide string.
    • Supports Ptr, FunPtr and StablePtr. Which introduces the possibility to have “caches”.
    • Supports Callbacks via Higherorder functions (everything is autogenerated from the function type)
    • Data types with multiple constructors become a union, where data types with a single constructor or a newtype are inlined and treated equally.
    • Re-exports existing exports found in source file
    • Honors existing Storable instances found for types it needs
    • Avoids unsafePerformIO as much as possible, except for in one specific instance.
    • Generates Initialization functions for you
    • Hides unnecessary exports using a DEF file
    • Allows you to override my default type conversions (e.g. String –> CWString)
    • Provides helper includes for C, C++ and C# (placed into wherever cabal places extra includes. %AppData%\cabal\Hs2lib-0.4.8 for windows)
    • And more

Limitations:

  • Does not support automatic expansion of lists in Applied types other then IO (e.g. Maybe [Int])
    • Does not support infix constructors (Arbitrary codegen limit, I’ve never needed it. Will add in future versions)
    • Code generator generates a bit too many parenthesis for the temporary Haskell file generated. Will fix in future versions.
    • Does not support polymorphic functions (this is a FFI limit, cannot know the size of a polymorphic value.)
    • Cannot export functions or types with the same name. The types are imported unqualified.

I will work  on the pdf documentation in the coming 2 weeks, That should be a complete overview on the abilities of this tool and why I made certain decisions.

To verify that the install (and the package is complete) , there are a set of tests included in the tar. just unpack it, and ghci to Tests.TestRunner and run the function “runLocalTests”. The cabal test interface I was using before got deprecated. I’ll have to look up how the new ones work, until then, sorry about that Smile

I will publish a more detailed tutorial later during the day.

QuickInfo


Visual studio has this ability to show information about symbols when you hover over them, this feature is called “QuickInfo”

This essentially means that you can hover over a symbol like “fmap” and it would tell you, fmap :: forall a b (f :: * -> *). (Functor f) => (a -> b) -> f a  -> f b and that it’s defined in GHC.Base

in ghci this would be equivalent to typing :i fmap which would result in the following output

class Functor f where
  fmap :: (a -> b) -> f a -> f b
  …
        — Defined in GHC.Base

Whenever the user hovers over a symbol in visual studio, the IDE will call a method

public void AugmentQuickInfoSession(IQuickInfoSession session, IList<object> qiContent, out ITrackingSpan applicableToSpan)

 

I use the information given to me to construct two things

  • The word the user is hovering on
  • The exact location within the source file of that word

This information is used to find the correct Name value in the Haskell Renamed AST. The problem is we can’t construct name values, so we have to look them up. This is provided with the help of a typeclass

class Finder a where
    findName     :: MonadPlus m => a -> FastString -> Maybe SrcSpan -> m Name

The monad used determines how many results you receive. Use a Maybe monad and you’ll get just 1. use a List monad and you’ll get more than one, but only if you don’t specify a specific source span to look for (wildcard match on name alone).

However we should never enter the PostTcType types inside the renamed AST. These are invalid at this stage. Unfortunately SYB’s listify does not provide a way to tell it not to enter a specific type.

So we create a modified version of those SYB calls:

data Guard where
  Guard :: Typeable a => Maybe a -> Guard
 
type HList = [Guard]

— | Summarise all nodes in top-down, left-to-right order
everythingBut :: (r -> r -> r) -> HList -> GenericQ r -> GenericQ r
everythingBut k q f x
  = foldl k (f x) fsp
    where fsp = case isPost x q of
                  True  -> []
                  False -> gmapQ (everythingBut k q f) x

isPost :: Typeable a => a -> HList -> Bool
isPost a = or . map check
where check :: Guard -> Bool
       check x = case x of
                   Guard y -> isJust $ (cast a) `asTypeOf` y

— | Get a list of all entities that meet a predicate
listifyBut :: Typeable r => (r -> Bool) -> HList -> GenericQ [r]
listifyBut p q
  = everythingBut (++) q ([] `mkQ` (\x -> if p x then [x] else []))

Now listify takes a HList of types not to inspect. HList is a Heterogeneous list, so it’ll allow things of different types inside it. Finding the Name is now as simple as:

instance Finder (HsGroup Name) where
    findName grp a b = findName (listifyBut (isName a b) [Guard (undefined :: Maybe PostTcType)] grp) a b
 

once we have the names, we can just call getInfo. Nothing else is needed because remember that all API calls have a Context as argument, for instance the full type of the tooltip function is:

— @@ Export
— | Gather information about the identifier you requested
–   .
–   Context: The session for this call, Serves as a cache
–   .
–   String : The name of the identifier to lookup
–   .
–   SrcSpan: The location of the identifier in the sourcefile
–   .
–   Bool   : Whether to treat this call as a strict one. If it’s strict
–            Then the name AND span must match. If it’s not, Any match will do
–   .    
getTooltip :: Context -> String -> SrcSpan -> Bool -> IO (Maybe String)

This produces the following result

image

There’s a problem however, if you hover over a variable name that’s defined in the body of the function it produces a runtime panic:

image

If you think about it, this kind of makes sense, GHCi also won’t produce anything on local variables. In fact you can’t even refer to them. But we would at the least we would like to prevent this crash, and in the best case scenario we would like *some* information on the symbol.

After poking around some I noticed that the type of the identifiers that produce the errors are “Internal Name” values. the function nameModule then fails on these types. The plan now is, whenever we find a Internal Name, we look into the TypecheckedModule to find the Id associated with the Name value we retrieved earlier. with SYB this is again easy. However there’s a catch (thanks to nominolo for pointing this out): we should not enter any PostTcKind nor NameSet because these are blank after type checking.

findID :: Data a => a -> Name -> [Id]
findID a n = listifyBut ((n==) . getName) [Guard (undefined :: Maybe NameSet)
                                                                       ,Guard (undefined :: Maybe PostTcKind)] a

and that’s all. The end result is that this now works on local variables as well. Hovering over for instance the variable file generates

image

The important thing to note here is the Context , it’ll contain a cache of information. So looking up any of this stuff will be instantaneous. You just hover and directly get back information.

A last cool but *I’m not so sure how useful* function is that if you select something, then hover over it, it will type check only that expression.

image

so if you have an expression "fmap foo” somewhere but don’t remember what type foo or fmap is, just select them and hover over the selection. (although this is somewhat limited, all identifiers have to be top-level. It can’t return anything for local variables. sorry Sad smile )

And that’s it for this post, I’ll continue the work on Cabal now, or continue this track and fully finish intellisense.

No video for this in action, since I have a cold Confused smile

Ghost typing


This is the preliminary version of the Ghost typing addition to visual Haskell, the idea is that whenever an explicit type signature is not given, the  IDE will display the type inferred by GHC.

You can then click on the signature to insert it, or use the smart action associated with the name of the function.

Up next is the feature that when you have given a signature that doesn’t type check, the IDE will remove that signature and retry, if it succeeds the IDE will display a suggested signature.

Below is a GIF of how the first part works.

ghostyping

Oh and collapsible regions has been finished as well Smile 

if the function has a type signature it will collapse at the end of that declaration, if not it’ll collapse at the end of the function name.

There is a restriction to this however since GHC allows you to declare your signatures anywhere in the file. In order for the signature to be considered part of the function by collapsible regions it has to be end on the line before the binding.

Which means it can span multiple lines, the end just has to be before the binding, that way it also supports haddock documented type signatures.