ANNOUNCE: Hs2lib-0.5.5


What is it?
========

A preprocessor and library which allow you to create dynamic libs
from arbitrary annotated Haskell programs with one click. It also
allows you to use the generated lib in C, C++ and C# just by including
the generated header files.

At a minimum it can be considered the inverse of c2hs.

Where to get it?
============

You can get it  from Hackage
(http://hackage.haskell.org/package/Hs2lib) or by using cabal
(cabal install Hs2lib).

Documentation, Mailing List, Source, etc
=======================

Go to https://mistuke.wordpress.com/category/hs2lib/ for information.
Or for a tutorial http://www.scribd.com/doc/63918055/Hs2lib-Tutorial

What’s New?
=========

– Currently Supported:

* Autogerates free functions for any StablePtr used.
* Added support for memory leak tracing with the –debug flag
  https://mistuke.wordpress.com/2011/08/09/tracing-lexer-memory-leaks/
* Fixed a bug in the library that caused an error in marshalling
* No longer frees strings, in order to prevent heap corruptions on
  calls from c#
* Fixed an issue with lists and type synonyms
* Fixed an alignment issue with stdcall
* Renamed the hs2c# pragma to hs2cs
* Fixed an error in parsing pragmas
* Fixed a majot marshalling bug having to do with lists of pointer types
* Started on an implementation of an automatic test generator to test
  exported functions.
* Re-arranged the namespaces generated in C#. Now Functions and types are
  put in different namespaces.

– Not Currently supported:

* Infix constructors
* Control over the autogenerated free functions for StablePtr
* Exporting of type/functions with the same name in different modules
* Exporting polymorphic functions
 
Details
=====

A more detailed documentation will follow in the next few weeks.

Why is the first version 0.4.8?
————–

This project has been in development for a very long time. Nearly two
years on and off.
It was developed mainly to facilitate the creation of Visual Haskell
2010. This is just the first public release of this tool.

Advertisement

A bug


It appears that both versions of the tool have a bug in the default conversions defined in NativeMappings.hs

They got introduced in a major restructuring that took place before 4.8 and slipped through my regression tests.

I’ll upload a fixed version as soon as possible.

Sorry for any inconvenience

Update: Version 0.5.1 fixes the bug mentioned.

Some more details: originally all my convertion functions were pure, the problem is, some activities like creating a pointer required IO. This is why currently everything is done in IO. When I converted the library I missed some calls, and those calls were giving an error now saying that the convertion function was undefined.

Tracing Lexer memory leaks


One of the problems I’ve been struggling with for a while now is the presence of pesky memory leaks in Visual Haskell. Hs2lib has one convention, It doesn’t free any memory, and so you’re responsible for freeing all memory.

As far as I knew, I was freeing any and all pointers that I had. It should not be leaking, but yet it was. So I decided to get to the root of this problem. I wrote a simple application that uses the Lexer classes of Visual Haskell and would emulate a user scrolling by feeding it lines of a Haskell file one at a time.

Using the Debug Diagnostics Tools I was able to track the application and make a full memory dump every few seconds in order to track the progression of the leak. The results were rather surprising:

WARNING – DebugDiag was not able to locate debug symbols for HsLexer.dll, so the reported function name(s) may not be accurate.

HsLexer.dll is responsible for 11.95 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:

HsLexer!HsEnd+1343ac4: 8.20 MBytes worth of outstanding allocations.

HsLexer!HsEnd+150ae89: 2.00 MBytes worth of outstanding allocations.

This was detected in LexerLeakTest.exe__PID__4660__Date__07_04_2011__Time_03_20_30PM__18__Leak Dump – Private Bytes.dmp

So according to this tool, my little program was leaking quite extensively and not surprisingly, it was all coming from my inside my Haskell program. Unfortunately, GHC/GCC can not produce the proper symbols (.pdb files) for any of the Microsoft debugging tools to understand. So while we know conclusively that the program is leaking, we don’t know where.

Hs2lib-Debug

This is where the new version of Hs2lib comes into play. The idea is to track any and all allocations and de-allocations made during the run of the program, in essence a simple profiler.

For this reason we now have Debug versions of the of modules. Through the magic of CPPHS, Cabal and custom preprocessors we get the usual “release” modules and “debug” modules which write the allocations to a file.

I’ll skip over the implementation of that, but the idea is to override all allocation functions with a custom one. The structure which is being written out to disk (the current version uses the rather slow show/read instances generated by GHC. I will be replacing these in the future) is:

<br />
data MemAlloc = MemAlloc { memFun   :: Caller<br />
                         , memStack :: Stack<br />
                         , memStart :: Address<br />
                         , memStop  :: Maybe Address<br />
                         , memSize  :: Maybe MemSize<br />
                         , memTime  :: String<br />
                         }<br />
              | MemFree { memStack :: Stack<br />
                        , memStart :: Address<br />
                        , memSize  :: Maybe MemSize<br />
                        , memTime  :: String<br />
                        }<br />
              deriving (Show, Read)<br />

The allocations get written out to a file called “Memory.dump”.

A piece of one such file is:

<br />
MemAlloc { memFun = Record<br />
         , memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:242(peekCWString)"<br />
                     (Then "HsLexer.hs:85(fromNative)"<br />
                     (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                      Empty))<br />
         , memStart = 43309024<br />
         , memStop = Just 43309026<br />
         , memSize = Just 2<br />
         , memTime = "1312779866"<br />
         }<br />
MemFree { memStack = Then "WinDll\\Lib\\/NativeMapping_Base.cpphs:246(freeCWString)"<br />
                    (Then "HsLexer.hs:85(fromNative)"<br />
                    (Then "HsLexer.hs:83(lexSourceStringWithExtA)"<br />
                     Empty))<br />
        , memStart = 43309024<br />
        , memSize = Just 2<br />
        , memTime = "1312779866"<br />
        }<br />

From this you can see that not only does it record allocation, but I’ve also implemented a simple artificial stack, that shows us where the allocation is/originated. The current implementation is rather simplistic. I will look into expanding this later, but for now it suits my needs.

For those wondering, this tracker is not enabled by default. To enable it just pass the “- -debug” flag to the call to Hs2lib. Compiling with this flag instructs the library to use the Debug version of the libraries, and changes the code generators so that they also add extra information to the generated code.

Since de-allocations also have to be tracked, using this flag also exposes a free function which should be used to free pointers. If the CallingConvention  used is stdcall then a function “freeS” is exported, if ccall then “freeC” is exported. The reason for this is because these functions are statically defined. They aren’t being generated by the tool but are instead part of the library part of the tool.

Performing analysis

Once we have a .dump file, the next step is to analyze this information. This is where the new tool Hs2lib-debug comes in. This tool replays the allocations in a pseudo heap. If all goes well, at the end the heap should be empty. If it has any entries it means we’ have a leak.

Invoking it is quite easy, just pass it as an argument the folder which contains the dump file:

Hs2lib-debug.exe -v .\MemDumps

and that’s all.

Running this on the dump file created from the lexer program returned:

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 31890 record(s).

*** Analyzing [********************] 100.00%

*** Found 1135 outstanding allocation(s).

1135 unfreed references found originating from HsLexer.hs:85(fromNative)\;\;\WinDll\Lib\/NativeMapping_Base.cpphs:242(peekCWString)\;

*** Cleaning up….

Done.

The messy output for the stack is a as follows: Entries in the path are separated by a ;, instead of the usual \ character.

The function the profiler pointed out is:

<br />
lexSourceStringWithExtA :: StablePtr (IORef [ExtensionFlag]) -&gt; CWString -&gt; IO ((StatelessParseResultPtr (Located Token)))<br />
lexSourceStringWithExtA a1 a2 =<br />
  do let st = newStack (__FILE__ ++ ":" ++ (show __LINE__) ++ "(lexSourceStringWithExtA)")<br />
     a1p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a1<br />
     a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />
     res &lt;- lexSourceStringWithExt a1p a2p<br />
     toNative st res<br />

And the exact line is

<br />
a2p &lt;- fromNative (pushStack st (__FILE__ ++ ":" ++ (show __LINE__) ++ "(fromNative)")) a2<br />

This is interesting for a couple of reasons., The profiler is saying that the pointer associated with the CWString (which is defined as a Ptr CWchar) is never freed. But why not.

The answer lies in the C# Marshaller and types being used. We are currently Marshalling C# strings using the String datatype. Strings in C# are immutable, so once the marshaller creates a wchar_t* from the string, it never worries about it again. They are strictly an in parameter.

There are two ways to solve this:

  • C# does have mutable strings, using the StringBuffer type. Using StringBuffer has the benefit that it already is implemented as a char pointer. The marshaller simply passes the pointer to the Haskell function and upon. After the function returns and the GC determines that the StringBuffer is no longer in use, it should free the memory (at least in theory).
  • Make the library just free any CWString it dereferences.

For now I’ve chosen the second approach, for no other reason other than it requires the least amount of change in existing code. In the future I’ll adopt the approach that Hs2lib will free any arguments being passed to it. I don’t know if this is the convention usually used. If someone has something against this approach I would love to hear it.

Update : There’s somewhat of a big gotcha that I’ve recently discovered. We have to remember that the String type in .NET is immutable. So when the marshaller sends it to out Haskell code, the CWString we get there is a copy of the original. We have to free this. When GC is performed in C# it won’t affect the the CWString, which is a copy.

The problem however is that when we free it in the Haskell code we can’t use freeCWString. The pointer was not allocated with C (msvcrt.dll)’s alloc. There are three ways (that I know of) to solve this.

  • use char* in your C# code instead of String when calling a Haskell function. You then have the pointer to free when you call returns, or initialize the pointer using fixed.
  • import CoTaskMemFree in Haskell and free the pointer in Haskell
  • use StringBuilder instead of String. I’m not entirely sure about this one, but the idea is that since StringBuilder is implemented as a native pointer, the Marshaller just passes this pointer to your Haskell code (which can also update it btw). When GC is performed after the call returns, the StringBuilder should be freed.

I’ve once again updated the library to not free any pointers. This prevents a nasty heap corruption. To not get memory leaks in c# you are free to choose between solution 1 & 3 presented here.

Results

With this new code in place, I once again run the profiled lexer to generate a dump. This time when I analyze the result however I get

*** Program starting up…

*** Reading file ‘.\MemDumps\Memory.dump’…

*** Found 33027 record(s).

*** Analyzing [********************] 100.00%

*** Found 0 outstanding allocation(s).

Congratulations, No memory leak(s) detected.

*** Cleaning up….

And so the memory leaks are fixed. It’s worth noting that the analyzers can be quite slow. It uses a flat LinkedList like structure and lists to do the analysis. I will in future versions be replacing these with a Tree like structure and arrays respectively.

Hs2lib: Automating compilation of dynamic libs


As most of you know Visual Haskell makes use of Haskell programs compiled to a dynamic library (dll).  This task can be completely automated. This is what Hs2lib provides. Hs2lib (formerly WinDll) was made to facilitate making changes and updates to Visual Haskell’s support files. It was designed for purely that purpose and thus has some limitations and design decisions that reflect this. I’ve been working mostly on getting this to a stable state the last few weeks/months. The one used in the original Visual Haskell used a hack for some things like lists. This one has a stable interface.

get it from Hackage with

cabal install hs2lib

using it is quite simple, I’ll illustrate with an example:

module Arith where

-- @@ Export
summerize :: [Int] -> [Int] -> Int
summerize x y = sum (x ++ y)

-- @@ Export
single :: Int -> [Int]
single x = [1..x]

Hs2lib does not export all functions automatically. you have to mark them with a special annotation “--@@ Export [=optional_name]”

if no name is specified, the function name is used. As a note, only functions with an explicit type signature can be exported. This tool works by static analysis. To compile a simple invocation to hs2lib is sufficient

PS C:\Examples> hs2lib Arith
Linking main.exe …
Done.

After this there should be 2 header files and one dll file named Hs2lib.dll in the folder. This name can be changed with the –n flag. Your Haddock documentation is carried over along with the original function type as comments for the new generated functions in the headers. Marshaling data is automatically generated for you for any data structures found during the dependency analysis phase. This can only be done if the source for the data type is available.

An example of how to call this function via C is

#include <stdio.h>
#include <stdlib.h>
#include "Hs2lib_FFI.h"

int main(int argc, char** argv) {

    HsStart();

    int* foo = (int*)malloc(sizeof(int)*3);
    foo[0] = 2;
    foo[1] = 5;
    foo[2] = 10;

    printf("Sum result: %d\n", summerize(3, foo, 3, foo));

    free(foo);

    int count;
    int* bar = single(8, &count);

    printf("Single result: (%d)\n", count);

    int i;
    for(i = 0; i < count; i++)
        printf("\t%d\n", bar[i]);

    free(bar);

    HsEnd();

    return (EXIT_SUCCESS);
}

Since Visual Haskell is written in C# this tool can also output C# code, this is done by using the –c# flag. I’m working on a PDF that goes into much greater detail about this tool and it’s options, But I need to reword some parts of it as I’ve been told it’s too “handholdy” (simplistic) in some parts. So I’ll hold off on publishing that.

But here are the Highlight of this tool

Currently Supported:

  • Generates Marshalling information for abitrary datatypes
  • Types with kind other then * have to be fully applied. (e.g. Maybe String). A specialized Marshaling instance is then generated for each of these instances.
    • Generates FFI exports for any function you mark, with FFI compatible type
    • Properly supports marshaling of lists. [Int], where the list is an argument becomes CInt –> Ptr Int , getting an explicit size for the list, and where a return value becomes Ptr CInt –> Ptr Int, where it expects a pointer to which to write the size of the list. This introduces a semantic difference between [Char] and String. The former is treated as a list of characters with no explicit terminator, whereas the latter it treated as a Null terminated wide string.
    • Supports Ptr, FunPtr and StablePtr. Which introduces the possibility to have “caches”.
    • Supports Callbacks via Higherorder functions (everything is autogenerated from the function type)
    • Data types with multiple constructors become a union, where data types with a single constructor or a newtype are inlined and treated equally.
    • Re-exports existing exports found in source file
    • Honors existing Storable instances found for types it needs
    • Avoids unsafePerformIO as much as possible, except for in one specific instance.
    • Generates Initialization functions for you
    • Hides unnecessary exports using a DEF file
    • Allows you to override my default type conversions (e.g. String –> CWString)
    • Provides helper includes for C, C++ and C# (placed into wherever cabal places extra includes. %AppData%\cabal\Hs2lib-0.4.8 for windows)
    • And more

Limitations:

  • Does not support automatic expansion of lists in Applied types other then IO (e.g. Maybe [Int])
    • Does not support infix constructors (Arbitrary codegen limit, I’ve never needed it. Will add in future versions)
    • Code generator generates a bit too many parenthesis for the temporary Haskell file generated. Will fix in future versions.
    • Does not support polymorphic functions (this is a FFI limit, cannot know the size of a polymorphic value.)
    • Cannot export functions or types with the same name. The types are imported unqualified.

I will work  on the pdf documentation in the coming 2 weeks, That should be a complete overview on the abilities of this tool and why I made certain decisions.

To verify that the install (and the package is complete) , there are a set of tests included in the tar. just unpack it, and ghci to Tests.TestRunner and run the function “runLocalTests”. The cabal test interface I was using before got deprecated. I’ll have to look up how the new ones work, until then, sorry about that Smile

I will publish a more detailed tutorial later during the day.

The project is NOT dead


Hi all, I just wanted to let everyone know that even though I have been silent for quite a while, and rather busy I have been working on it. I have the modifications on Cabal done, and almost done with those on Cabal-Install. Various small bug fixes and I’m working on improving the speed of it all. I’ll post another video soon.

I’ve also upgraded the internal ghc from the 6.13 internal snapshot I’ve been using to the released GHC 7. This is turns out to be a quite annoying upgrade, the RTS seems to segfault randomly and code that used to compile on 6.x no longer compiles on 7. But as far as I can tell, it should.

In any case, this means all of the marshaling code has to be regenerated, which means I have to change the code generated by my tool. I’m also rewriting the way the DLLs are loaded. They’re currently loaded in the same/default address space as the IDE. The problem is, when GHC panics, for whatever reason, the RTS is forced to quit and takes everything with it. I haven’t found anyway to stop this, so I’m working on some kind of process isolation. This is unfortunately a design error on my part, I always figured that a ghc panic was just an exception and did not terminate the rts.

I’ve promised a beta a lot, so I won’t do it this time, I initially planned on releasing the source along with a first release since I don’t think it’s wise for multiple people to work on the core interactions, but If I don’t make that by September, I’ll just publish what I have so that others can help if they want to.

Visual studio 2010 sp1 is coming out “soon” as well. This should fix a few of the problems I’ve been having. (yay). And in 2012 the next major release of visual studio (don’t worry, it’ll be done before that Smile with tongue out). Barring any major api changes, it’ll be easy to support it and any following versions of visual studio.

But to reiterate, the project is not dead, far from it. I just have a lot of work to do, some problems to solve.

MSBuild Part 1


I took some time this week from working on my Thesis to work on MSBuild again, I know it’s slow going, but unfortunately this project is not my highest priority task atm (I thank you all for your patience). The result is the following

The new compile and project pipeline for a .cabal file now looks as follows:

[.cabal file] <-> [IDE] <-> [Cabal2MSBuild] <-> [MSBuild]

The execution looks like

  1. You double click on a .cabal file which launches visual studio
  2. The IDE then invokes Cabal2MSBuild on the .cabal file to get a .hsproj file
  3. This .hsproj file is use strictly internally. It’s only purpose is to instruct the IDE which files to include, references, Author etc. Any changes done to these fields will be also directly written to the .cabal file via Cabal.
  4. When you want to build a project, a MSBuild task is run, this task even though defined in the .hsproj files will use no information from this file. It calls cabal-install on the original .cabal file.
  5. Errors and warnings are captured and translated into something the IDE can understand.

This is a much simpler and stabler approach than the first one which was to deeply and directly integrate .cabal files inside the IDE. This requires a immense amount of work, And I never could get it to work 100%.

Having finished my MSBuild book (which was really the only *complete and coherent* source of information on MSBuild) I was able to create the Cabal2MSBuild tool. This tool will create a .hsproj file from a .cabal file which contains all information that was present in the .cabal file but also a full listing of which files to include in the project. References etc. the last line of this file is an important line

<Import Project="Haskell.Cabal.targets"/>

This line imports a set of predefined Cabal specific tasks for MSBuild. These tasks as they’re called wrap calls to the cabal-install tool. The exposed functionalities are

Build, Run, Configure, Deploy, Clean and Update. (As a side note, this Target file can be used by any msbuild compatible build system. So this means using Team Foundation Server as a source control server and having continuous builds going on should be possible).

These Tasks are all exposed by a custom Task file “Cabal.MSBuild.Tasks.dll”

(The end of this post will contain the full project and Task files)

To show that this all works, here’s a example output

Phyx>cabal2msbuild Cabal2MSBuild.cabal Test.proj
Done.

Phyx>msbuild /target:Configure /property:CabalProjectFile="Cabal2MSBuild.cabal"
Test.proj
Microsoft (R) Build Engine Version 4.0.30319.1
[Microsoft .NET Framework, Version 4.0.30319.1]
Copyright (C) Microsoft Corporation 2007. All rights reserved.

Build started 10/12/2010 12:32:50.
Project "C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj" on node 1 (Configure target(s)).
Configure:
  VSH2010 Configuring project file...
  VSH2010 Using Cabal 'C:\Users\Phyx\AppData\Roaming\cabal\bin\Cabal.exe'
  Running Cabal...
TSKCONFIGURE : warning : The package list for 'hackage.haskell.org' is 29 days old. [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  Run 'cabal update' to get the latest list of available packages.
TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  Fields allowed in this section:
  name, version, cabal-version, build-type, license, license-file,
  copyright, maintainer, build-depends, stability, homepage,
  package-url, bug-reports, synopsis, description, category, author,
  tested-with, data-files, data-dir, extra-source-files,
  extra-tmp-files
TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  Fields allowed in this section:
  name, version, cabal-version, build-type, license, license-file,
  copyright, maintainer, build-depends, stability, homepage,
  package-url, bug-reports, synopsis, description, category, author,
  tested-with, data-files, data-dir, extra-source-files,
  extra-tmp-files
TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  Fields allowed in this section:
  name, version, cabal-version, build-type, license, license-file,
  copyright, maintainer, build-depends, stability, homepage,
  package-url, bug-reports, synopsis, description, category, author,
  tested-with, data-files, data-dir, extra-source-files,
  extra-tmp-files
TSKCONFIGURE : warning : This package indirectly depends on multiple versions of the same [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  package. This is highly likely to cause a compile failure.
  package ghc-6.12.1 requires Cabal-1.8.0.2
  package bin-package-db-0.0.0.0 requires Cabal-1.8.0.2
  package Cabal2MSBuild-0.2.2 requires Cabal-1.9.2
  VSH2010 Configuring done.
Done Building Project "C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj"
 (Configure target(s)).


Build succeeded.

"C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj" (Configure target) (1) ->
(Configure target) ->
  TSKCONFIGURE : warning : The package list for 'hackage.haskell.org' is 29 days old. [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  TSKCONFIGURE : warning : Cabal2MSBuild.cabal: Unknown fields: other-modules (line 17) [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]
  TSKCONFIGURE : warning : This package indirectly depends on multiple versions of the same [C:\Users\Phyx\Documents\Haskell\Cabal2MSBuild\Test.proj]

    5 Warning(s)
    0 Error(s)

Time Elapsed 00:00:28.65

This exposes a few problems. While it works, the feedback is inaccurate. The problem is cabal-install has no set format for warnings and errors. They’re all freeform so MSBuild will only recognize the first line and this is due to the “warning: “ prefix of the lines. Ideally we want the entire warning since that’s what we’re going to report back to the user.

This same problem exists for GHC as well, however GHC is a bit more structured, mostly errors are pretty printed and have some kind of structure to them, which you can parse by looking at the indentation of text.

As I’m trying to modify as little as possible (It’ll be easier to maintain in the future and takes the burden of maintenance of me) I’m writing a set of parser to augment MSBuild’s build in parsers to support messages generated by cabal-install and GHC.

The current .hsproj file is a valid project file, and though I haven’t integrated it into the IDE yet, when I do, It should just work, since there is full support for MSBuild project files already there (again, less to maintain, and no reflection hacks this time).

Eventually the Cabal2MSBuild tool will be available on hackage and the build tasks on codeplex since hackage doesn’t allow binary dependencies.

And now, as promised the full files, note however that both of these files are work in progress.

Appendix A: Target file

<?xml version="1.0" encoding="UTF-8"?>

<!–

***********************************************************************************************

Haskell.Cabal.targets

WARNING:  DO NOT MODIFY this file unless you are knowledgeable about MSBuild and have

          created a backup copy.  Incorrect changes to this file will make it

          impossible to load or build your projects from the command-line or the IDE.

This file defines the steps in the standard build process for Haskell Cabal projects.

Copyright (C) Tamar Christina. All rights reserved.

***********************************************************************************************

–>

<Project DefaultTargets="Build" InitialTargets="Configure" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

 
    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskRun"       />

    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskBuild"     />

    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskConfigure" />

    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskDeploy"    />

    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskClean"     />

    <UsingTask AssemblyFile="Cabal.MSBuild.Tasks.dll" TaskName="TskUpdate"    />

    <Target Name="Clean">

        <TskClean CabalFile="$(CabalProjectFile)" />

    </Target>

   
    <Target Name="Run">

        <TskRun CabalFile="$(CabalProjectFile)" />

    </Target>

   
    <Target Name="Deploy">

        <TskDeploy CabalFile="$(CabalProjectFile)" Target="$(CabalDeployedFile)" />

    </Target>

   
    <Target Name="Configure">

        <TskConfigure CabalFile="$(CabalProjectFile)" User="$(CabalUserConfigure)" />

    </Target>

   
    <Target Name="Build">

        <TskBuild CabalFile="$(CabalProjectFile)" />

    </Target>   
   
    <Target Name="Update">

        <TskUpdate/>

    </Target>

   
    <Target Name="Make">

        <TskConfigure CabalFile="$(CabalProjectFile)" User="$(CabalUserConfigure)" />

        <TskBuild CabalFile="$(CabalProjectFile)" />       
    </Target>

    <Target Name="MakeRun">

        <TskConfigure CabalFile="$(CabalProjectFile)" User="$(CabalUserConfigure)" />

        <TskBuild CabalFile="$(CabalProjectFile)" />

        <TskRun CabalFile="$(CabalProjectFile)" />

    </Target>

</Project>

Appendix B: hsproj file

<?xml version="1.0" encoding="UTF-8"?>

<Project DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

<PropertyGroup>

<Configuration Condition=" ‘$(Configuration)’ == ” ">Build</Configuration>

<SchemaVersion>2.0</SchemaVersion>

<ProjectGuid>{99999999-9999-9999-9999-999999999999}</ProjectGuid>

<RootNamespace>Hs2lib</RootNamespace>

<AssemblyName>Hs2lib</AssemblyName>

<EnableUnmanagedDebugging>false</EnableUnmanagedDebugging>

<License>BSD3</License>

<LicenseFile></LicenseFile>

<Maintainer>Tamar Christina &lt;…@zhox.com&gt;</Maintainer>

<Author>Tamar Christina &lt;…@zhox.com&gt;</Author>

<Stability>experimental</Stability>

<Homepage>http://www.zhox.com/projects/haskell/hs2lib</Homepage>

<PkgUrl></PkgUrl>

<BugReports></BugReports>

<Synopsis>A Library and Preprocessor that makes it easier to create shared libs (note: only tested on windows) from Haskell programs.</Synopsis>

<Description>The supplied PreProcessor can be run over any existing source and would generate FFI code for every function marked to be exported by the special notation documented inside the package. It then proceeds to compile this generated code into a windows DLL.

The Library contains some helper code that’s commonly needed to convert between types, and contains the code for the typeclasses the PreProcessor uses in the generated code to keep things clean.

It will always generated the required C types for use when calling the dll, but it will also generate the C# unsafe code if requested.

Read http://www.zhox.com/projects/haskell/hs2lib.pdf

Current Restrictions:

– Does not automatically resolve missing datatype declarations using hackage. Future releases will search library code for the types you need to resolve this but currently you’ll get a missing instance error.

– You cannot export functions which have the same name (even if they’re in different modules because 1 big hsc file is generated at the moment, no conflict resolutions)

– You cannot export datatypes with the same name, same restriction as above.

– Does not support automatic instance generation for infix constructors yet

</Description>

<Category>Development</Category>

<DataDir></DataDir>

</PropertyGroup>

<PropertyGroup Condition=" ‘$(Configuration)’ == ‘GHCi’ ">

<DebugSymbols>true</DebugSymbols>

<OutputPath>bin\Debug</OutputPath>

<OutputType>HI</OutputType>

</PropertyGroup>

<PropertyGroup Condition=" ‘$(Configuration)’ == ‘Build’ ">

<DebugSymbols>false</DebugSymbols>

<OutputPath>bin\Build</OutputPath>

<OutputType>EXE</OutputType>

</PropertyGroup>

<ItemGroup>

<Content Include="$(DataDir)\Templates\main.template-unix.c"/>

<Content Include="$(DataDir)\Templates\main.template-win.c"/>

<Content Include="$(DataDir)\Includes\Tuples.h"/>

<Content Include="$(DataDir)\Includes\Instances.h"/>

</ItemGroup>

<ItemGroup>

<Compile Include="$(DataDir)\WinDll\*.hs"/>

<Compile Include="$(DataDir)\*.hs"/>

<Compile Include="$(DataDir)\Includes\*.h"/>

</ItemGroup>

<ItemGroup>

<Reference Include="base"/>

<Reference Include="syb"/>

</ItemGroup>

<ItemGroup>

<!–

<None Include="*.Cabal">

    <Type>Library</Type>

    <Exposed>True</Exposed>

</None>

–>

<Compile Include="WinDll\Lib\Converter.*" />

<Compile Include="WinDll\Lib\NativeMapping.*" />

<Compile Include="WinDll\Lib\Tuples.*" />

<Compile Include="WinDll\Structs\Types.*" />

<Compile Include="WinDll\Lib\InstancesTypes.*" />

</ItemGroup>

<!–

<ItemGroup Type="Executable" ExeName="" Name="Hs2lib" ModulePath="Hs2lib.hs"/>

–>

<Import Project="Haskell.Cabal.targets"/>

</Project>

Status update


This is a small update to just let people know what I’ve been up to. This will be a non-technical post.

There are 3 components which  I want to get done before I put any code publicly online.

  • Cabal support (being able to build and run Haskell projects from the IDE)
  • Documentation support (class browser, quickinfo docs and F1 help integration along with jump to definition)
  • Intellisense support

Of the 3 I want to get Cabal support working, then release something, get feedback while I work on the other two. I’m working on all 3 concurrently (mostly depending on which part of visual studio I want to mess with that day) So how far along are they.

  • Cabal: I had a first version which hooked into the Cabal library and using quite a bit of reflection hacks and code changes to the MPF templates managed to load .cabal project files. This first approach was because I wanted to talk directly to Cabal, and not go through any intermediate layers. The reflection hacks were needed because the MPF templates are hardcoded to MSBuild, which uses an XML file format. Not using MPF would mean writing all that code myself which would have taken ages. Unfortunately this only worked sometimes, other times I would get an exception from deep within visual studio. I also had no idea how I would get building to work.

    Ultimately I decided to scrap the entire approach al together, It wasn’t worth the hassle and would be hard to maintain. I’ve now settled on the idea of converting .Cabal files to an internal MSBuild script (and back to .cabal when saving), while adding new templates for a Haskell target type which just invokes cabal-install. This is a much simpler approach which takes some of the burden of maintenance of me and into other tools, but which unfortunately requires me to learn about MSBuild. Currently I’m creating the conversion tool Cabal2MSBuild, which is about 20% done.

  • Documentation support: Documentation from the current module is gained from the AST inside GHC (hopefully, haven’t checked the data I get back yet). Documentation on external modules (e.g. package modules) are gained from two places (hopefully). For quick info the intellisense cache will be used, for class browser (e.g. browing of current project and dependencies) the .hpi files will be used. For F1 help haddock generated documentation will be used.

    However in order for the haddock documentation to be integrated into visual studio it has to be in the correct format. As it turns out, documentation has been greatly simplified in visual studio 2010. The documentation format basically comes down to a zip files renamed to something else, which contains a simple index xml file and just xhtml content files. Great news there, since haddock already does generate xhtml. However I still need to modify the files generated to include a few meta tags, which will be used by the document installer to create indices of the html files, and for F1. I’ve started the modifications to Haddock and they’re about 60% done.

  • Intellisense: I already have the .hpi files, which were those simplified indices of packages. I would like to provide intellisense for both project files and standalone Haskell files. There are a few ways to approach this. From what I’ve seen in the past, visual studio builds a intellisense cache file from the project dependencies on the fly on first launch/use of the project. If you have a large package database this could be handy, it limits the search space, but features such as auto-add imports/dependencies will become harder, as I would have to do 2 checks (local cache, and if not found hit a global cache). Standalone files also require me to only use the bigger (slower) global cache.

    However the speed of that global cache hasn’t been measured yet (because the cache hasn’t been made yet) . So for now, until I have some hard numbers, I’ve settled on just always using the globally constructed index. This is also about 20-30% done. The majority of the work left is reading some documentation and papers.

Hopefully this informs you what I’ve been up to the past few weeks,

Why didn’t you use Scion?


I seem to get this question quite a lot, so I think it might be a good idea to address it globally instead of on a per individual basis like I’ve been doing up till now.

for those who don’t know Scion it is a service that provides common functionality that editors for Haskell might need. It’s a wrapper around the GHC-API and provides a much more simplified interface to it all than the ghc-api. It implements a state for things like caching etc and works in the form of a out of process server (it runs as a separate process).  It’s mode of communication with the outside world is Serialization.

Now Visual Haskell (the original ones) were written using H/Direct and FFI to communicate via a COM bridge with Visual Studio. When I first thought of this project I was planning on “just” updating the original code to the new COM interfaces of Visual Studio (2008) 2010 while also updating the GHC from the 6.6/6.8 used to 6.10/6.12 . However there were a couple of problems with this, First off it relied on internal GHC structures which have changed a lot in time and some even completely removed (the changelog mentions “this seems to not be used by anything”).

The second major problem was that Visual Studio itself evolved rather rapidly and the new interfaces were quite different. (If you want to use the new features).  So I decided to rewrite it all. Start from scratch with the aim to rely less on the compiler internals but more on the interfaces it provides. I used the original paper on Visual Haskell to get started on this, In their approach they used Marshalling to read/write information directly from and to Haskell land, so I took the same route. At this point I did not know about Scion at all, and even If I did, I would have been following the route the original authors made with marshalling since I can directly pass the result of the call to visual studio by making sure the datatypes visual studio want be generated from the Haskell code.

So therein lies the first problem, Scion uses XML serialization (afaik) and I wanted Marshalling. The second problem was that the datatypes I have were tailored specifically to visual studio. Eventually I decided to drop using the COM interfaces and moved to the managed interface using C#. Because C# can handle unsafe code I didn’t have to change anything (even though I eventually did).

Then comes that whenever I add new features (while some overlap with Scion). I generate specific datatypes that I need, and that I don’t know if they would be useful to the rest of the world, hence that’s my reason for not “expanding” Scion to my needs. It needs to stay generic afteral.

Now here people say that EclipseFP uses it just fine, but their site states

EclipseFP uses Scion, the Haskell IDE library. See http://code.google.com/p/scion-lib/ for more information.
You cannot use the version from Hackage (0.1.0.2) since commands have been added for eclipsefp. From version 1.111, eclipsefp includes a modified source distribution of scion, that is built and used by eclipsefp if the use built-in server preferences option is checked. Since it is by default, eclipse might be a bit slow to start the first time scion is built. Otherwise, you’ll need to build Scion from source (git clone git://github.com/JPMoresmau/scion.git, runhaskell Setup.hs configure/build/install)

Which means that EclipseFP also uses a modified version of Scion. (which wasn’t merged back presumably for the same reasons I’m hesitant to change it).

Also when I took a look at Scion later on, I noticed that it really had one contributor, (nominolo). Now nominolo has been pretty helpful in helping me out, he seems like a pretty busy guy. So at the time I figured Scion wasn’t an active project (at least not very active). And I was not in the mood to have to contribute to another project (I already have enough as is). The last commit was somewhere in mid 2009, and it seems to have been just minor tweaks and bug fixes in 2009.

And Finally the last argument is that I wanted to learn about GHC as well. Having to write all this was a great way to learn how it does things, and that alone I think is a big enough reason for why I have not used Scion. It was also a lot of fun running into problems and having to find fixes for it. This post was not meant to be arrogant or condescending in anyway, who knows, I might back peddle one day and switch to Scion. But at this point it’s very difficult. But in order to foster the community effort behind it, I might one day commit the stuff in Visual Haskell that make sense back into Scion.

Intellisense Part 1–Haskell Package Interface


As most of you who have been following this blog know I have IntelliSense and Cabal support left. I decided to focus on IntelliSense first (even though Cabal support is easier). So this is the first in a series of posts on how I’ve decided to implement IntelliSense.

[Sidenote: University has started again, So I’m afraid I’ll only have time to work on this project in the weekends, at least, when I’m coherent enough to Smile]

IntelliSense for those of you who don’t know is Microsoft’s implementation for Code Completion, a small overview can be found [here]. However the gist of it is that when the user starts typing in a relevant place that the IDE will try and help the user along by showing identifiers and or types currently in scope. To that extend Visual Haskell will support two types of scopes

  • Function scopes: e.g. whenever you’re inside a function, you’ll get a list of every bindings (both local and global) ,lambda variables and Modules in scope. Should you type a module name and a . you’ll get the other module names you can choose or functions you can use qualified from that module if any.
  • Type scopes: e.g. whenever you’re working inside a type signature, the list will limit itself to types that are currently in scope (along with modules again of course).

This is how I plan to implement code completion, If anyone has any requests of suggestions please let me know now since I can still change it for the initial release now.

In order to implement IntelliSense I need to index all the packages currently installed by GHC and also keep updating this as time goes by and you install new cabal packages. Visual Haskell will ship with a custom version of cabal-install ghc-pkg (and eventually a custom haddock as well in order to generate Visual Studio help files) so keeping them up to date should not be a problem.

I have still not decided how to store this information, But I’m leaning towards a structure with a Spatial Index , more specifically I’m leaning towards using a BANG file. I believe using this file will allow me to do the different kinds of lookups I need to do while having a memory mapped file.

But the first step is to get the information from ghc-pkg and ghc on your packages. These are then stored in a .hpi file (haskell package interface). Which is just a very simplified version of the .HI files ghc uses. They contain functions + documentation, classes declarations, instances and types. The reason for these files is two folds:

  • For the class browser we want to be able to browse packages (in a simplified manner) so these files will contain all we need for now, along with the location of the actual .hi file if we need it for more complex stuff later.
  • From these files I will generate the large IntelliSense database, this will not contain any information on classes etc. so we need a way to quickly get to  these. (especially for things like code snippets)

In any case, the first step is now completed, I can successfully generate .hpi files with all the content described above. It does this for my configuration, which contains

C:/ghc/ghc-6.12.1\lib\package.conf.d:
    Cabal-1.8.0.2
    Win32-2.2.0.1
    array-0.3.0.0
    base-3.0.3.2
    base-4.2.0.0
    bin-package-db-0.0.0.0
    bytestring-0.9.1.5
    containers-0.3.0.0
    directory-1.0.1.0
    (dph-base-0.4.0)
    (dph-par-0.4.0)
    (dph-prim-interface-0.4.0)
    (dph-prim-par-0.4.0)
    (dph-prim-seq-0.4.0)
    (dph-seq-0.4.0)
    extensible-exceptions-0.1.1.1
    ffi-1.0
    filepath-1.1.0.3
    ghc-6.12.1
    (ghc-binary-0.5.0.2)
    ghc-prim-0.2.0.0
    haskell98-1.0.1.1
    hpc-0.5.0.4
    integer-gmp-0.2.0.0
    old-locale-1.0.0.2
    old-time-1.0.0.3
    pretty-1.0.1.1
    process-1.0.1.2
    random-1.0.0.2
    rts-1.0
    syb-0.1.0.2
    template-haskell-2.4.0.0
    time-1.1.4
    utf8-string-0.3.4

C:\Users\Phyx\AppData\Roaming\ghc\i386-mingw32-6.12.1\package.conf.d:
    Cabal-1.9.2
    HTTP-4000.0.9
    Hs2lib-0.2.2
    MonadCatchIO-mtl-0.3.0.1
    QuickCheck-2.1.0.3
    ansi-terminal-0.5.3
    binary-0.5.0.2
    colorize-haskell-1.0.1
    cpphs-1.11
    deepseq-1.1.0.0
    fgl-5.4.2.2
    ghc-mtl-1.0.1.0
    ghc-paths-0.1.0.6
    ghc-syb-0.2.0.0
    haddock-2.7.2
    haskell-lexer-1.0
    haskell-src-1.0.1.3
    haskell-src-exts-1.8.2
    haskell-src-exts-1.9.0
    hint-0.3.2.3
    mtl-1.1.0.2
    network-2.2.1.7
    parallel-2.2.0.1
    parsec-2.1.0.1
    primitive-0.3
    tar-0.3.1.0
    uuagc-0.9.10
    uuagc-0.9.14
    uuagc-0.9.23
    uuagc-0.9.26
    uulib-0.9.10
    uulib-0.9.12
    vector-0.6.0.2
    zlib-0.5.2.0

In about 39.16seconds and swallowing about 500mb of ram to do so while maxing out a core. So users will most likely not notice this first step at all. A snap of what the internal of a .hpi file looks like is:

image

Configurable Candy


Candy is a feature where you replace a selection of text with something else (usually also text), however this is done in view only and so not in the actual file. This is useful to replace things like “->” with an actual Unicode arrow while still allowing other text editors that can’t handle Unicode to display the file correctly.

Leksah implements this and allows you to configure it via a so called “Candy” file, So I “borrowed” their approach and extended it to suit my needs.

The general syntax of a Visual Haskell candy file (.vshc) is

-- "<token>" <unicode> <modifier> <enabled> <FIT|NONE>
-- the token has to be quoted
-- the supported modifiers are
-- CODE    - Apply only to regions of code
-- COMMENT - Apply only inside comments
-- STRING  - Apply only in string literals
-- ALL     - Apply to all

the modifiers are self explanatory but the FIT or NONE modifiers take some explaining.

When using the FIT modifier, the Candy engine won’t try to keep the same width as the text it’s replacing. This means that you get a layout change. The actual file might have “alpha” but the view will show only “a”.

With some things, especially with keywords we don’t want this, this is where the NONE modifier comes in. When this is used the engine will always match the width of the text it’s replacing by making the Unicode text larger and adding horizontal whitespace. This means that “alpha” would be rendered as “  a  “ and so preserving the layout.

A shot of this in action can  is:

image

For reference, the full default candy file that will be shipping with VSH2010 is:

— Candy file

— Format

— "<token>" <unicode> <modifier> <enabled> <FIT|NONE>

— the token has to be quoted

— the supported modifiers are

— CODE    – Apply only to regions of code

— COMMENT – Apply only inside comments

— STRING  – Apply only in string literals

— ALL     – Apply to all

— Note that the replacement block will always take up the exact same

— space as the tokens it’s replacing. e.g. "alpha" will be replaced by "  a  "

"->"         0x2192    CODE       True       NONE     –RIGHTWARDS ARROW

"<-"         0x2190    CODE       True       NONE     –LEFTWARDS ARROW

"=>"        0x21d2    CODE       True       NONE     –RIGHTWARDS DOUBLE ARROW

">="        0x2265    CODE       False      NONE     –GREATER-THAN OR EQUAL TO

"<="        0x2264    CODE       False      NONE     –LESS-THAN OR EQUAL TO

"/="          0x2260    CODE       False      NONE     –NOT EQUAL TO

"&&"        0x2227    CODE       False      NONE     –LOGICAL AND

"||"           0x2228    CODE       False      NONE     –LOGICAL OR

"++"        0x2295    CODE       False      NONE     –CIRCLED PLUS

"::"           0x2237    CODE       False      NONE     –PROPORTION

".."           0x2025    CODE       False      NONE     –TWO DOT LEADER

"^"            0x2191    COMMENT    False      NONE     –UPWARDS ARROW

"=="        0x2261    CODE       False      NONE     –IDENTICAL TO

" . "          0x2218    CODE       True       NONE     –RING OPERATOR

"\\"           0x03bb    CODE       True       NONE     –GREEK SMALL LETTER LAMBDA

"=<<"       0x291e    CODE       False      NONE     —

">>="       0x21a0    CODE       False      NONE     —

"$"           0x25ca    CODE       False      NONE     —

">>"        0x226b    CODE       False      NONE     — MUCH GREATER THEN

"forall"    0x2200    CODE       False      NONE     –FOR ALL

"exist"     0x2203    CODE       False      NONE     –THERE EXISTS

"not"       0x00ac    CODE       False      NONE     –NOT SIGN

"alpha"         0x03b1    ALL        True       FIT      –ALPHA

"beta"           0x03b2    ALL         True       FIT      –BETA

"gamma"     0x03b3    ALL        True       FIT      –GAMMA

"delta"          0x03b4    ALL        True       FIT      –DELTA

"epsilon"     0x03b5    ALL        True       FIT      –EPSILON

"zeta"           0x03b6    ALL        True       FIT      –ZETA

"eta"             0x03b7    ALL        True       FIT      –ETA

"theta"          0x03b8    ALL        True       FIT      –THETA

— Because you can configure options inside the editor itself, don’t comment out

— lines since they won’t be parsed, just change the enable flag