Wednesday, May 30, 2007

Using Scala Extractors with the XSD Schema Infoset Model

In this post I'm going to use Scala extractor objects with the Eclipse XML Schema Infoset Model to identify common ways of defining XML Schemas.

Hopefully I'm going to show that complex patterns on common Java objects can be identified using Scala extractors.

The Eclipse Schema Infoset Model is a complex EMF model the represents the W3C XML Schema. The Analyzing XML schemas with the Schema Infoset Model and Analyze Schemas with the XML Schema Infoset Model articles provide a nice explanation on how to work with this model.

For this post I wanted to create Scala patterns that identify common design patterns in Xml Schemas. There are four common patterns for XML Schemas: Russian Doll, Salami Slice, Venetian Blind and Garden of Eden.

The article Introducing Design Patterns in XML Schemas provide a nice explanation on each of this patterns. Also the article talks about a nice feature of NetBeans Enterprise Pack that allows the user to move a schema from one design pattern to another. More information on these design patterns can be found on the article Global vs Local from the xFront site.

The W3C XML Schema model is huge, but for this post I'm going to consider only a small subset.

The first step is the definition of the extractor objects that will be used to have access to certain properties of the Xml Schema Infoset model.


package langexplr.scalaextractorexperiments;

import org.eclipse.emf.ecore.resource._
import org.eclipse.emf.ecore.resource.impl._
import org.eclipse.xsd._
import org.eclipse.xsd.impl._
import org.eclipse.xsd.util._
import org.eclipse.emf.common.util.URI

object XSDSchemaParts {
def unapply(schema : XSDSchema) =
Some ((schema.getTargetNamespace(),
List.fromIterator(
new JavaIteratorWrapper[XSDTypeDefinition](
schema.getTypeDefinitions().iterator())),
List.fromIterator(
new JavaIteratorWrapper[XSDElementDeclaration](
schema.getElementDeclarations().iterator()))))

}

object XSDElementParts {
def unapply(elementDeclaration : XSDElementDeclaration) =
Some((elementDeclaration.getName(),elementDeclaration.getTypeDefinition()))


}

object XSDComplexType {
def unapply(typeDefinition : XSDTypeDefinition) =
if (typeDefinition.isInstanceOf[XSDComplexTypeDefinition]) {
val complexType = typeDefinition.asInstanceOf[XSDComplexTypeDefinition];
Some((complexType.getName(),complexType.getContent()))
} else {
None
}

}

object XSDSimpleType {
def unapply(typeDefinition : XSDTypeDefinition) = {
if (typeDefinition.isInstanceOf[XSDSimpleTypeDefinition]) {
Some(typeDefinition.asInstanceOf[XSDSimpleTypeDefinition])
} else {
None
}
}
}

object XSDParticleContent {
def unapply(p : XSDParticle) = Some(p.getContent())
}

object XSDSimpleSequenceModelGroup {
def unapply(complexTypeContent : XSDComplexTypeContent) = {
complexTypeContent match {
case XSDParticleContent(mg : XSDModelGroup)
if (mg.getCompositor().getName == "sequence") =>
Some(
List.fromIterator(
new JavaIteratorWrapper[XSDParticle](
mg.getContents.iterator())))
case _ => None
}
}



The XSDSchemaParts, XSDElementParts, XSDComplexType, XSDSimpleType, and XSDParticleContent extractor objects provide access to some properties of a model object. For example the XSDSchemaParts returns a tuple with the target namespace, the complex type definitions and the element definitions.

Also the XSDSimpleSequenceModelGroup provide an easy way to identify a common pattern that is the use of a XSD sequence as the type main element.

A class will be created for each design pattern. The following trait is the base for all of them:


trait XsdDesignPattern {
def name : String
def identify(schema:XSDSchema) : boolean
}



Now we can define each pattern:

Russian Doll

This design pattern says that the structure of the XML Schema is similar to the document structure. Only one public element is defined and all other elements are defined inside of it.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsRussianDoll"
xmlns:p="http://langexplr.blogspot.com/DocsRussianDoll"
xmlns="http://langexplr.blogspot.com/DocsRussianDoll"
elementFormDefault="qualified">
<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element name="header">
<xs:complexType>
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin"
type="xs:integer" />
</xs:complexType>

</xs:element>
<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element name="paragraph"
type="xs:string" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="footer">
<xs:complexType>
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin"
type="xs:integer" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>


The Scala code to identify this design pattern looks like this:


class RussianDoll extends XsdDesignPattern {
def name = "Russian Doll"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
List(),
List(XSDElementParts(
name,
XSDComplexType(
null,
XSDSimpleSequenceModelGroup(elements))))) => {
true
}
case _ => false
}
}




Salami Slice

This design pattern says that all elements must be declared at the top level with the type declaration inside of them.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsSalamiSlice"
xmlns:tns="http://langexplr.blogspot.com/DocsSalamiSlice"
xmlns="http://langexplr.blogspot.com/DocsSalamiSlice"
elementFormDefault="qualified">

<xs:element name="content" type="xs:string" />
<xs:element name="paragraph" type="xs:string" />

<xs:element name="header">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>
</xs:element>

<xs:element name="footer">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>
</xs:element>

<xs:element name="body">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:paragraph" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element ref="tns:header" />
<xs:element ref="tns:body" />
<xs:element ref="tns:footer" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>



The Scala code to identify this design pattern looks like this:


class SalamiSlice extends XsdDesignPattern {
def name = "Salami Slice"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
List(),
elements) => {
elementsWithReferences(elements)
}
case _ => false
}
// Utility methods

def forAllInnerElements(l : List[XSDElementDeclaration],
pred : XSDElementDeclaration => boolean) =
l.forall{
case XSDElementParts(
_,
XSDComplexType(null,XSDSimpleSequenceModelGroup(particles))) =>
particles.forall({
case XSDParticleContent(e:XSDElementDeclaration) => pred(e)
case _ => false })
case XSDElementParts(_,XSDComplexType(null,null)) => true
case XSDElementParts(_,XSDSimpleType(_)) => true
case _ => false
}

def elementsWithReferences(x : List[XSDElementDeclaration]) =
forAllInnerElements(
x,
(e:XSDElementDeclaration) => e.isElementDeclarationReference)

}




Venetian Blind

This design pattern says that there one global element and all other elements use types declared at the top level.

For example:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsVenetianBlind"
xmlns:tns="http://langexplr.blogspot.com/DocsVenetianBlind"
xmlns="http://langexplr.blogspot.com/DocsVenetianBlind"
elementFormDefault="qualified">

<xs:complexType name="sectionType">
<xs:sequence>
<xs:element name="content" type="xs:string" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>

<xs:complexType name="bodyType">
<xs:sequence>
<xs:element name="paragraph" type="xs:string" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>

</xs:complexType>

<xs:element name="page">
<xs:complexType>
<xs:sequence>
<xs:element name="header" type="tns:sectionType" />
<xs:element name="body" type="tns:bodyType" />
<xs:element name="footer" type="tns:sectionType" />
</xs:sequence>
</xs:complexType>
</xs:element>

</xs:schema>




The Scala code for this pattern looks like this:


class VenetianBlind extends XsdDesignPattern {
def name = "Venetian Blind"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
types,
List(XSDElementParts(
_,
XSDComplexType(
_,
XSDSimpleSequenceModelGroup(elements))))) =>
elements.forall((e:XSDParticle) =>
elementWithTypeReferences(e,types))
case _ => false
}
def elementWithTypeReferences(e : XSDParticle, types : List[XSDTypeDefinition]) =
e match {
case XSDParticleContent(e:XSDElementDeclaration) =>
e.getTypeDefinition.getContainer.isInstanceOf[XSDSchema] &&
!(types.find ((t:XSDTypeDefinition) => t == e.getTypeDefinition)).isEmpty
case _ => false
}

}



Garden of Eden

This design pattern says that all the elements and types must be declared global.


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://langexplr.blogspot.com/DocsGardenOfEden"
xmlns:tns="http://langexplr.blogspot.com/DocsGardenOfEden"
xmlns="http://langexplr.blogspot.com/DocsGardenOfEden"
elementFormDefault="qualified">

<xs:complexType name="sectionType">
<xs:sequence>
<xs:element ref="tns:content" />
</xs:sequence>
<xs:attribute name="margin" type="xs:integer" />
</xs:complexType>

<xs:complexType name="bodyType">
<xs:sequence>
<xs:element ref="tns:paragraph" minOccurs="0"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>

<xs:element name="content" type="xs:string" />

<xs:element name="paragraph" type="xs:string"/>

<xs:element name="header" type="tns:sectionType" />

<xs:element name="body" type="tns:bodyType" />

<xs:element name="footer" type="tns:sectionType" />

<xs:complexType name="pageType">
<xs:sequence>
<xs:element ref="tns:header" />
<xs:element ref="tns:body" />
<xs:element ref="tns:footer" />
</xs:sequence>
</xs:complexType>

<xs:element name="page" type="tns:pageType" />
</xs:schema>



The Scala code for this pattern looks like this:


class GardenOfEden extends XsdDesignPattern {
def name = "Garden Of Eden"
def identify(schema : XSDSchema) =
schema match {
case XSDSchemaParts(
namespace,
types,
elements) =>
elements.forall((e : XSDElementDeclaration) =>
elementWithTypeReferences(e,types))
case _ => false
}

def elementWithTypeReferences(e : XSDElementDeclaration, types : List[XSDTypeDefinition]) =
e.getTypeDefinition.getContainer.isInstanceOf[XSDSchema] &&
((types.find ((t:XSDTypeDefinition) => t == e.getTypeDefinition)) match {
case Some(XSDComplexType(_,XSDSimpleSequenceModelGroup(particles))) =>
particles.forall({
case XSDParticleContent(e:XSDElementDeclaration) =>
e.isElementDeclarationReference
case _ => false })
case Some(XSDComplexType(_,null)) => true
case Some(XSDSimpleType(_)) => true
case None =>
e.getTypeDefinition.getTargetNamespace == "http://www.w3.org/2001/XMLSchema"
case _ => false
})

}





Finally we need a class to test all the patterns:


object XsdDesignPatterns {
def patterns:List[XsdDesignPattern] = List(new RussianDoll,
new SalamiSlice,
new VenetianBlind,
new GardenOfEden)
def identify(schema : XSDSchema) =
patterns.filter((p:XsdDesignPattern) => p identify schema).map((p:XsdDesignPattern) => p.name)
}




The code for this experiment can be found here.

Friday, May 25, 2007

Pattern matching on Java Objects with Scala Extractors

Scala extractors provide a nice way to use pattern matching on common Java objects. In this post I'm going to show a simple use of extractors with Java reflection objects.

More detailed information about Scala extractors can be found in the main website, in the Scala Language Specification section 8.1.7 and in the Matching Object With Patterns paper.

An extractor object can be used to extract simple values from Java objects. For example, the following code defines a extractor object that returns the name of a java.lang.Class.


object ClassName {
def unapply(c : java.lang.Class) =
Some(c.getName())
}


This extractor can be used as follows:



val c = "hola".getClass()

c match {
case ClassName(name) =>
System.out.println("The class name is: "+name)
}



Also extractors can be combined with other Scala pattern matching elements. For example, given the following extractor definitions:


object ClassName {
def unapply(c : java.lang.Class) =
Some(c.getName())
}

object ClassMethods {
def unapply(c : java.lang.Class) =
Some(c.getMethods())
}


object MethodParts {
def unapply(c : java.lang.reflect.Method) =
Some((c.getName(),c.getReturnType(),c.getParameterTypes()))
}


We can write:


def methodsThatReturnString(c : java.lang.Class) =
c match {
case ClassMethods(mts) =>
List.fromArray(
for {val m <- mts
m match {
case MethodParts(_,ClassName("java.lang.String"),_) => true
case _ => false
}} yield m)
}


Another example of this is the following:

Given these definitions:


object MemberClasses {
def unapply(c : java.lang.Class) =
Some(c.getClasses())
}


We can write:


Class.forName("java.util.Map") match {
case c@MemberClasses(Array(ClassName(inner))) =>
System.out.println("The class or interface "+
c.getName()+
" has one inner class/interface called "+
inner)
case _ => System.out.println("No inner classes")
}



With these simple/artificial examples we only covered the surface of what can be done with pattern matching in existing Java libraries. As with F# active patterns in .NET, Scala Extractors seems to be a useful tool to deal with complex object models.

Saturday, May 19, 2007

Using F# active patterns with LINQ expression trees

F# active patterns provide a nice way to handle complex object models.

In this post, I'm going to write a couple of active patterns to access parts of LINQ expression trees. As a little experiment I'm going to change the implementation of the Linq To Google Desktop expression tree handling from C# to F#.


The following definitions of active patterns provide access to sections of expression trees.


let (|BinaryExpression|_|) (x:Expression) =
if (x :? BinaryExpression)
then let be = (x :?> BinaryExpression)
in Some (be.Left,be.Right)
else None

let (|AndAlso|_|) (x:Expression) =
if (x.NodeType = ExpressionType.AndAlso)
then match x with
| BinaryExpression(l,r) -> Some (l,r)
| _ -> None
else None

let (|Equal|_|) (x:Expression) =
if (x.NodeType = ExpressionType.Equal)
then match x with
| BinaryExpression(l,r) -> Some (l,r)
| _ -> None
else None

let (|IsExpression|_|) (x:Expression) =
if (x :? TypeBinaryExpression)
then let be = (x :?> TypeBinaryExpression)
in Some (be.Expression,be.TypeOperand)
else None

let (|MethodCall|_|) (x:Expression) =
if (x.NodeType = ExpressionType.Call &&
(x :? MethodCallExpression))
then
let mc = x :?> MethodCallExpression
in Some (mc.Object,
mc.Method,
IEnumerable.to_list(mc.Arguments))
else None

let (|Cast|_|) (x:Expression) =
if (x.NodeType = ExpressionType.Convert
&& (x :? UnaryExpression))
then let ue = (x :?> UnaryExpression)
in Some (ue.Type,ue.Operand)
else None

let (|Lambda|_|) (x:Expression) =
if ( (x :? UnaryExpression) &&
((x :?> UnaryExpression).Operand :? LambdaExpression))
then let ue = (x :?> UnaryExpression)
in Some (ue.Operand :?> LambdaExpression).Body
else None

let (|MemberAccess|_|) (x:Expression) =
if (x.NodeType = ExpressionType.MemberAccess
&& (x :? MemberExpression))
then let ue = (x :?> MemberExpression)
in Some (ue.Expression,ue.Member)
else None

let (|MethodName|) (m:MethodInfo) = m.Name
let (|TypeName|) (t:Type) = t.Name

let (|PropertyWithName|_|) (m:MemberInfo) =
if (m :? PropertyInfo)
then Some (m.Name)
else None

let (|ExpressionType|) (x:Expression) = x.Type


As you can see, most of the active patterns are Partial Recognizers (see here), for example (|Equal|_|) . This is because of all the dynamic type tests that we need to do.


The C# code for expression tree handling presented on the Linq to Google Desktop post, is located in the GDesktop.CollectQueryInfo method. By using these new active pattern definitions the code could be rewritten in F# as follows:



let rec CollectQueryInfoFS (e:Expression, qi:GDQueryInfo) =
match e with
| Lambda(body) -> CollectQueryInfoFS (body,qi)
| AndAlso(l,r) -> CollectQueryInfoFS(l,qi);
CollectQueryInfoFS(r,qi)
| MethodCall(_,MethodName("Contains"),[argument]) ->
qi.AddTerm(GetStringFromArgument(argument))
| Equal(MemberAccess(ExpressionType(TypeName("GDFileResult")),
PropertyWithName("FileType")),
value) ->
qi.FileType <- GetStringFromArgument(value)
| Equal(MemberAccess(ExpressionType(TypeName("GDEmailResult")),
PropertyWithName(pName)),
value) ->
CollectEmailProperty(pName,GetStringFromArgument(value),qi)
| IsExpression(_,TypeName("GDFileResult")) ->
qi.ElementType <- new Nullable<GDElementType>( GDElementType.File )
| IsExpression(_,TypeName("GDEmailResult")) ->
qi.ElementType <- new Nullable<GDElementType>( GDElementType.Email )
| _ -> raise (new NotSupportedException(e.ToString()))


By using the active pattern definitions, the code is much smaller and easer to read and modify.

Right now only the expression tree handling part was translated to F#. The point where we call F# looks like this:


private GDQueryInfo ProcessWhereMethodCall(MethodCallExpression mcExpression)
{
Expression theObjectArgument = mcExpression.Arguments[0];
Expression whereExpressionTree = mcExpression.Arguments[1];

GDesktop provider =
(GDesktop)((ConstantExpression)theObjectArgument).Value;

GDQueryInfo qi = new GDQueryInfo();
//CollectQueryInfo(whereExpressionTree, qi);
Langexplr.FsharpTests.Langexplr.FsharpTests.CollectQueryInfoFS(
whereExpressionTree, qi);
return qi;

}


For future posts I'm going to try to rewrite the entire implementation.

Wednesday, May 16, 2007

Pattern matching on .NET objects with F# active patterns

One of the most interesting things about F# active patterns, is that it allow the creation pattern matching expressions on existing .NET objects . In this post I'm going to show a couple of examples of using F# pattern matching with common .NET objects.

The "Combining Total and Ad Hoc Extensible Pattern Matching in a Lightweight Language Extension" paper describes this feature and gives lots of useful and representative examples (A draft of this paper was commented here).

Active patterns are officially available starting with version 1.9.1.8, but version 1.9.1.9 was used for these examples.

One the simplest ways to start defining active patterns on .NET objects is to create one that returns the value of a property. For example, the following active pattern extracts the FullName property from System.IO.FileInfo .


let (|FileInfoFullName|) (f:FileInfo) = f.FullName


Now we can using FileInfoFullName this way:


let f = (new FileInfo("ActivePatternsTests.exe"))
in
match f with
| FileInfoFullName n ->
Console.WriteLine("File name: {0}",n)


Also active patterns can be used in conjunction with built in F# patterns for example. Given the following active pattern:


let (|FileInfoNameSections|) (f:FileInfo) =
(f.Name,f.Extension,f.FullName)


We can write:


let foo (f:FileInfo) =
match f with
| FileInfoNameSections(_,".txt",fn) -> ("Text file: "+fn)
| _ -> "No text file"


Also active patterns can be combined. For example:

Given these active pattern definitions:


let (|FileSecurity|) (f:FileInfo) = f.GetAccessControl()

let (|NtAccountOwner|) (f:FileSecurity) =
let o = f.GetOwner((typeof() :
ReifiedType< NTAccount >).result) :?> NTAccount
in o.Value


We can write:


let goo (f:FileInfo) =
match f with
| FileSecurity(NtAccountOwner("MyMachine\\auser")) -> "Yes"
| _ -> "No"


Only one kind of active patterns was used in this post, the article describe several others with more interesting characteristics.

More interesting experiments can be done with other .NET APIs. One nice example is LINQ expression trees.

Saturday, May 12, 2007

Adding support for projections to Linq to Google Desktop

I this post I'm going to show how support for projections was added to the Linq To Google Desktop experiment.

In order to support queries like:


var e10 = from t in gd
where t is GDFileResult &&
t.Contains("sql")
select ((GDFileResult)t).Location;

foreach (string s in e10) {
Console.WriteLine("The file name is: " + s);
}




We need to add support for handling the Select method. The query described above is processed by the compiler as:


gd.Where( ... ).Select( ... );


Given that the Where method processing returns an instance of GDQuery, then the processing of the Select method call is done in the GDQuery.CreateQuery method. The implementation of this method is the following:


public IQueryable<T> CreateQuery<T>(System.Linq.Expressions.Expression expression)
{
if (IsSelectMethodCall(expression))
{
MethodCallExpression methodCallExpr = (MethodCallExpression)expression;

UnaryExpression unaryQuoteExpr =
(UnaryExpression)methodCallExpr.Arguments[1];

LambdaExpression lambdaExpr =
(LambdaExpression)unaryQuoteExpr.Operand;

GDQuery query =
(GDQuery)((ConstantExpression)methodCallExpr.Arguments[0]).Value;

return query.AsEnumerable<GDResult>().Select(
(Func<GDResult,T>)lambdaExpr.Compile()).AsQueryable<T>();
}
else
{
throw new NotSupportedException("Not supported method call");
}

}


What this method does is to delegate the execution to the public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector); method , which is the extension method fro IEnumerable. The only thing we also need to do is to compile the lambda expression that represents the filter.

This is done this way because elements in the projection section cannot be used to make the Google Desktop query more specific. This is not the case of Linq To SQL where the projection section affects the way the query is processed by the DBMS.

Now we can write queries like:


var e11 = from t in gd
where t is GDFileResult &&
t.Contains("sql")
select new { FileName = ((GDFileResult)t).Location,
Info = new FileInfo(((GDFileResult)t).Location)};

foreach (var fResult in e11)
{
Console.WriteLine(fResult.Info.Name);
}

Friday, May 11, 2007

LINQ to Google Desktop

Given the number of recent blog posts and articles talking about creating LINQ providers, I decided to go ahead and try create a provider for Google Desktop.

The work in this post is based on the following posts/articles:



Also samples were created using the March 2007 CTP of Orcas.

The first thing to do is try to figure out how a Linq to Google Desktop query look like.

As discussed in a previous post, the result of a Google Desktop query could yield different kinds of results. For example a query for the "linq" term could return references to files, emails, calendar items, etc.

In order to deal with this, the first thing to do was to create a class hierarchy for the kinds of elements that could result from a Google Desktop query. This class hierarchy will have GDResult as the base class (which will be almost the same as an Indexable in the Google Desktop schemas).

Also we need to create a wrapper around the Google Desktop COM API, to capture the results and instantiate the appropriate class given the selected result.

For simplicity of this experiment only GDFileResult and GDEmailResult will be implemented and supported by the provider.


public class GDResult
{
public string Schema {...}
public string Content {... }

public bool Contains(string term)
{
...
}
}

public class GDFileResult : GDResult
{
public string FileType { ... }
public string Location { ... }
}

public class GDEmailResult : GDResult
{
public string From { ... }
public string To { ... }
public string Cc { ... }
public string Subject { ... }
}


Given this we could image how the Linq queries will look like. For example if we want to get all the PDF files that contain the "statement" and "expression" terms we could write:



GDDesktop gd = new GDProvider();
var t = from t in gd
where t.Contains("statement") &&
t.Contains("expression")&&
t is GDFileResult &&
((GDFileResult)t).FileType == "pdf"
select t;




This Linq query will be translated to a Google Desktop query:

" statement expression filetype:pdf "

And the t is GDFileResult must tell the Google Desktop API that only file references must be retrieved.

This first thing to do is to create the GDesktop this class represents our connection with Google Desktop this object is similar to a database connection object in Linq to Sql . GDesktop must implement the System.Linq.IQueryable<T> interface. This interface inherits from IEnumerable<T>. Since at this time there's no official documentation on how to implement this interface correctly, only the required elements were added.

This implementation of the Google Desktop provider looks like this:


public class GDesktop :IQueryable<GDResult>
{

public IQueryable<T> CreateQuery<T>(Expression expression)
{
GDQuery q = new GDQuery();
if (expression.NodeType == ExpressionType.Call)
{
MethodCallExpression mcExpression =
(MethodCallExpression)expression;
switch (mcExpression.Method.Name)
{
case "Where":
GDQueryInfo qi = ProcessWhereMethodCall(mcExpression);
q.QueryInfo = qi;
break;

default:
throw new NotImplementedException(
"Could not handle method: "+
mcExpression.Method.Name);

}
}
else
{
throw new NotSupportedException(expression.ToString());
}
return (IQueryable<T>)q;

}


public Expression Expression
{
get {
return Expression.Constant(this);
}
}


#region Expression tree processing
....
#endregion


#region Not implemented methods

public TResult Execute<TResult>(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}


public IEnumerator<GDResult> GetEnumerator()
{
throw new Exception("The method or operation is not implemented.");
}

IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new Exception("The method or operation is not implemented.");
}

public IQueryable CreateQuery(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}

public Type ElementType
{
get { throw new Exception("The method or operation is not implemented."); }
}

public object Execute(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}

#endregion



The CreateQuery method is very important and is called with the expression tree representing the contents of the first section the query. For example for the following query:


var e = from t in gd
where t.Contains("linq")
select t;


Is interpreted by the compiler as:


var e = gd.Where(t => t.Contains("linq"));


When the compiler determines that gd is an IQueryable it generates calls to the Expression property and the CreateQuery method.

The CreateQuery method of the GDesktopclass is going to return a instance of a GDQuery object which represents the specific query that is being created. Also a GDQueryInfo object is used store collected query terms and modifiers from the expressions trees.

The only supported operation that can be applied to GDesktop is Where. This is very important because no other operation, like Select, Take or Skip could be applied because there's no way (that I know of) to get all the elements of the Google Desktop index!.

The ProcessWhereMethodCall walks the where expression tree to collect the required query terms and the query modifiers . The only supported elements in the where section are:


  1. Calls to the Contains method meaning that the document contains a term or phrase

  2. 'is' expressions to filter the type of element that must be returned

  3. Equality expression of "query time" properties for elements such as 'Filetype' or 'From' or 'To'



Processing for each of this elements is implemented this way:


#region Expression tree processing

private GDQueryInfo ProcessWhereMethodCall(MethodCallExpression mcExpression)
{
Expression theObjectArgument = mcExpression.Arguments[0];
Expression whereExpressionTree = mcExpression.Arguments[1];

GDesktop provider =
(GDesktop)((ConstantExpression)theObjectArgument).Value;

GDQueryInfo qi = new GDQueryInfo();
CollectQueryInfo(whereExpressionTree, qi);
return qi;

}

private void CollectQueryInfo(Expression whereExpressionTree, GDQueryInfo qi)
{
if (whereExpressionTree is UnaryExpression &&
((UnaryExpression)whereExpressionTree).Operand
is LambdaExpression)
{
UnaryExpression ue = (UnaryExpression)whereExpressionTree;
CollectQueryInfo(((LambdaExpression)ue.Operand).Body, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.AndAlso)
{
BinaryExpression be = (BinaryExpression)whereExpressionTree;
CollectQueryInfo(be.Left,qi);
CollectQueryInfo(be.Right,qi);
}

if (whereExpressionTree.NodeType == ExpressionType.Call)
{
ProcessMethodCall(whereExpressionTree, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.Equal)
{
ProcessEqual(whereExpressionTree, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.TypeIs)
{
ProcessTypeIs(whereExpressionTree, qi);
}

}

private void ProcessTypeIs(Expression whereExpressionTree, GDQueryInfo qi)
{
if (whereExpressionTree is TypeBinaryExpression)
{
TypeBinaryExpression be = (TypeBinaryExpression)whereExpressionTree;

if (be.Expression is ParameterExpression &&
((ParameterExpression)be.Expression).Type == typeof(GDResult) &&
be.TypeOperand.IsSubclassOf(typeof(GDResult)))
{
switch (be.TypeOperand.Name)
{
case "GDFileResult":
qi.ElementType = GDElementType.File;
break;
case "GDEmailResult":
qi.ElementType = GDElementType.Email;
break;
case "GDResult":
qi.ElementType = null;
break;
default:
throw new NotSupportedException("Element type not supported");
}
}
}
else
{
throw new NotSupportedException("TypeIs expression not supported");
}
}

private void ProcessEqual(Expression whereExpressionTree, GDQueryInfo qi)
{

if (whereExpressionTree is BinaryExpression)
{
BinaryExpression be = (BinaryExpression)whereExpressionTree;
Expression leftExpression = be.Left;
if (IsAPropertyAccessToSpecificResultItem(leftExpression, typeof(GDFileResult)))
{
string argumentValue = (string)GetObjectFromArgument(be.Right);

switch (((MemberExpression)leftExpression).Member.Name)
{
case "FileType":
qi.FileType = argumentValue;
break;
}

}
else
{
if (IsAPropertyAccessToSpecificResultItem(
leftExpression,
typeof(GDEmailResult)))
{

string argumentValue = (string)GetObjectFromArgument(be.Right);

switch (((MemberExpression)leftExpression).Member.Name)
{
case "From":
qi.From = argumentValue;
break;
case "To":
qi.To = argumentValue;
break;
case "Cc":
qi.Cc = argumentValue;
break;
case "Subject":
qi.Subject = argumentValue;
break;
}
}
else
{
throw new NotSupportedException("Property not supported");
}
}
}
else
{
throw new NotSupportedException("Member access not supported");
}
}


private void ProcessMethodCall(Expression whereExpressionTree, GDQueryInfo qi)
{
MethodCallExpression mCall =
(MethodCallExpression)whereExpressionTree;
if (mCall.Method.Name == "Contains")
{
object o = GetObjectFromArgument(mCall.Arguments[0]);
qi.AddTerm((string)o);
}
else
{
throw new NotSupportedException("Method call not supported "+mCall.ToString());
}

}


private object GetObjectFromArgument(Expression e)
{
return LambdaExpression.Lambda(e).Compile().DynamicInvoke();

}

private static bool IsAPropertyAccessToSpecificResultItem(Expression leftExpression, Type elementType)
{
return leftExpression.NodeType == ExpressionType.MemberAccess &&
((MemberExpression)leftExpression).Expression.NodeType
== ExpressionType.Convert &&
((UnaryExpression)((MemberExpression)leftExpression).Expression).Type
== elementType &&
((UnaryExpression)((MemberExpression)leftExpression).Expression).Operand.Type
== typeof(GDResult) &&
((MemberExpression)leftExpression).Member is PropertyInfo;
}

#endregion



Certainly this code is not pretty. In the future I'll try to improve it.

Since we want to allow the user to pass variables, function calls, property references, literal, etc. to the query arguments. For example:


var e2 = from t in gdProvider
where t.Contains(args[0]) &&
t.Contains(GetString(1))&&
t is GDFileResult &&
((GDFileResult)t).FileType == "pdf"
select t;



Because of this we need to get the value from the expression tree representing the query argument. This is done in the GetObjectFromArgument method.


private object GetObjectFromArgument(Expression e)
{
return LambdaExpression.Lambda(e).Compile().DynamicInvoke();

}


Note that the expression tree representing the query argument is being compiled as the body of a lambda expression with no parameters. Then we call the generated delegate to get the value. This is a very good example that shows that you have complete control over the complete query, even the arguments.

Now that we have processed the where section of the query and that we have collected the necessary elements to build the query string, we need a place to put the call to Google Desktop. By looking above in the definition of the GDesktop.CreateQuery method is important to note that GDQuery also have to implement IQueryable.For now our implementation is very basic. The place to put the call to Google Desktop will be the GetEnumerator method of the GDQuery class.


public class GDQuery : IQueryable
{

public IEnumerator GetEnumerator()
{
GDesktopWrapper gd = new GDesktopWrapper();
string qs = qInfo.CreateQueryString();
return gd.Query(qs, qInfo.ElementType).GetEnumerator();
}

public Expression Expression
{
get {
return Expression.Constant(this);
}
}

private GDQueryInfo qInfo;
public GDQueryInfo QueryInfo
{
get {
return this.qInfo;
}
set
{
this.qInfo = value;
}
}



#region Not implemented methods
...
#endregion
}


For features like projections we need to put more work on this implementation.

Code for this experiment can be found here.

In future posts I'll continue working with this implementation in order to add new features. For example projections,joins, and other Linq features.

Wednesday, May 2, 2007

Using Google Desktop from .NET

In this post I'm going to show how to use the Google Desktop COM API from .NET .

Google Desktop provides a couple of ways to query its index: using the local HTTP server or using the COM Query API . In this post the COM API will be used from C#.

The first step to start working with this API in .NET is to import the typelib


C:\temp\tst>tlbimp <google-desktop-path>\GoogleDesktopAPI2.dll


This will generate the file GoogleDesktopAPILib.dll.


The next thing you need to do is to register the application that will be using the COM API. Here's the code to register the application (taken from the JScript examples ):


using GoogleDesktopAPILib;

...

static int Register() {
int cookie;
object[] description = new object[]{
"Title",
" tests",
"Description",
"Simple tests",
"Icon",
"My Icon@1"
};

string myGuid = "{5323E036-345C-4323-548D-32AA55603215}";
GoogleDesktopRegistrar registar
= new GoogleDesktopRegistrar();


registar.StartComponentRegistration(myGuid,description);
object regObjObj =
registar.GetRegistrationInterface(
"GoogleDesktop.QueryRegistration");

IGoogleDesktopRegisterQueryPlugin q =
regObjObj as IGoogleDesktopRegisterQueryPlugin;

if (q == null) {
throw new Exception("Registration problem");
}

cookie = q.RegisterPlugin(myGuid,true);

registar.FinishComponentRegistration();

return cookie;
}




Registration needs to be executed just once. The returned cookie value could be stored somewhere and reused when the application is executed. For example the JScript demos in the GD SDK use the registry to store the cookie. In this demo I'm going to use a text file to store the cookie.


static int GetRegistrationCookie() {
int cookie;
FileInfo cookieFile = new FileInfo(CookieFileName);
if (cookieFile.Exists)
{
StreamReader reader =
new StreamReader(cookieFile.FullName);
string line = reader.ReadLine();
cookie = int.Parse(line.Trim());
reader.Close();
}
else
{
cookie = Register();
StreamWriter writer =
new StreamWriter(CookieFileName);
writer.WriteLine( cookie.ToString() );
writer.Close();

}
return cookie;
}


Now that we have the registration cookie we can call the Query API . This API is accessed with the GoogleDesktopQueryAPI class. This class has a Query method which receives the registration cookie, a string with the query, and some options. Here's the source for our Main method of a program which receives a query from the command line and prints the location of the matching resources:


public static void Main(string[] args)
{
if (args.Length == 1 && args[0] == "-unregister") {
UnRegister();
} else {
int cookie = GetRegistrationCookie();
string queryString = string.Join(" ",args);

GoogleDesktopQueryAPI queryAPI =
new GoogleDesktopQueryAPI();

IGoogleDesktopQueryResultSet rs =
queryAPI.Query(cookie,queryString,null,null);

IGoogleDesktopQueryResultItem2 i;
while ((i = (IGoogleDesktopQueryResultItem2)rs.Next()) != null) {
Console.WriteLine(i.GetProperty("uri"));
}
}
}


There's a lot of properties that can be extracted from a result item depending on its schema, for example if the result is a mail message.

Code for this post can be found here.