I wanted to create something that loads the XSD Schema into a object structure that can be queried in order to the determine the elements that will be generated. I didn't look for a existing library that does this because it will be a much better exercise to try to build it myself. However creating something that support the full XSD specification like this or this is a HUGE task so I chose to support only a small subset of it.
For XML parsing and generation, I'm using REXML which is a very nice library for XML manipulation.
The basic strategy for loading the XSD Schema is to create a collection of classes that handles each part of the supported schema features. For example
SchemaElement
for supporting element declarations and SchemaComplexType
was created for supporting the complexType declarations. Since an XSD Schema is a common XSD document loading each element is done by using a
load_from
, for example for SchemaElement
the load_from
method looks like this:
class SchemaElement
...
def load_from(elementDefinition,prefixes)
@name = elementDefinition.attributes["name"]
if (elementDefinition.attributes["type"]) then
@element_type = Reference.new(elementDefinition.attributes["type"],prefixes)
end
if (elementDefinition.attributes["substitutionGroup"]) then
@substitution_group = Reference.new(elementDefinition.attributes["substitutionGroup"],prefixes)
end
elementDefinition.find_all {|e| !e.is_a?(REXML::Text)}.each{|e|
case e.name
when "complexType"
ct = SchemaComplexType.new
ct.load_from(e,prefixes)
@element_type = ct
else
print ""Warning: ignoring #{e}"
end
}
end
...
end
As shown in the
load_from
method, there're relationships between schema elements, for example the type of the element could be a type defined elsewhere inside this schema or an imported schema. Once the schema is loaded, there's a process that takes the references and replace them with a real reference to the object. For the SchemaElement
the solve_references_method
looks like this:
class SchemaElement
...
def solve_references(collection)
if @substitution_group.is_a? (XSDInfo::Reference) then
@substitution_group = collection.get_type(
@substitution_group.namespace,
@substitution_group.name)
end
if @element_type.is_a?(XSDInfo::Reference) then
if(r = collection.get_type(@element_type.namespace,@element_type.name)) then
@element_type = r
else
print "Not found #{@element_type.namespace}.#{@element_type.name}\n"
end
else
if !@solving then
@solving = true
@element_type.solve_references(collection) unless @element_type == nil
@solving = false
end
end
end
...
end
Here
collection
points to a SchemaCollection
object that holds all the loaded schemas.Having all this we can load an XSD Schema and start querying for its parts, for example, we can get the list of attributes that apply to the b tag in the XHTML schema:
$ irb -r xsd/xsd.rb
irb(main):001:0> sc = XSDInfo::SchemaCollection.new
=> #<XSDInfo::SchemaCollection:0xb7b71170>
irb(main):002:0> sc.add_schema XSDInfo::SchemaInformation.new("../xhtml1-strict.xsd")
irb(main):003:0> sc.namespaces.each {|ns| sc[ns].solve_references sc}
=> ["http://www.w3.org/1999/xhtml"]
irb(main):004:0> sc["http://www.w3.org/1999/xhtml"].elements["b"].all_attributes.collect {|x| x.name}
=> ["onkeydown", "onkeypress", "onmouseover", "onkeyup", "onmousemove", "onmouseup", "ondblclick", "onmouseout", "onmousedown", "onclick", "title", "class", "id", "style", "dir", nil, "lang"]
Now, for generating the XML sample we can create a
generate_sample
for each part of the schema. For example the generate_sample
for the SchemaComplexType
looks like this:
## Sample Generation
def generate_sample_content(e,context)
atts = all_attributes.select {|x| x.name != nil && rand > 0.7}
atts.each {|att|
sample_length = 1 + (10*rand).to_i
sample_text = (1..sample_length).to_a.collect{ |p|
ltrs = ("a"[0].."z"[0]).to_a
ltrs[(ltrs.length*rand).to_i]
}.pack("c"*sample_length)
e.attributes[att.name] = sample_text
}
self.all_content_parts.each {|p| p.generate_sample_content(e,context)}
end
The value of the attributes must be valid according to its simple type. However this is not supported right now.
Another example for the
generate_sample
method for the SchemaChoice
class is the following:
def generate_sample_content(e,context)
if (@minOccurs == 1 && @maxOccurs == 1) then
element_to_gen = @elements[(rand*@elements.length).to_i]
element_to_gen.generate_sample_content(e,context)
elsif (@minOccurs == 0 && @maxOccurs == 1) then
element_to_gen = @elements[(rand*@elements.length).to_i]
element_to_gen.generate_sample_content(e,context) unless rand < 0.5
elsif (@maxOccurs == "unbounded") then
(1..(rand * 4).to_i).each {|i|
element_to_gen = @elements[(rand*@elements.length).to_i]
element_to_gen.generate_sample_content(e,context) unless rand < 0.5
}
end
end
Now with all this infrastructure we can generate some sample XML files:
def generate_sample_html_element name
sc = XSDInfo::SchemaCollection.new
sc.add_schema XSDInfo::SchemaInformation.new("../xhtml1-strict.xsd")
sc.namespaces.each {|ns| sc[ns].solve_references sc}
doc = REXML::Document.new
f = File.new("output.xml","w")
doc.elements << sc[sc.namespaces[0]].elements[name].a_sample
doc.write(f,3,false,false)
f.close
return sc
end
We call:
irb(main):006:0> generate_sample_html_element "b"
Generates:
<b class="zlxzzyunen" onkeydown="uaqz" onkeypress="kqyqmqn" onmouseover="sevcgov" onkeyup="ezglfa" lang="ckn" ondblclick="gfaskd" onmousedown="jwed" onclick="m">
<script/>
<del ondblclick="xeepat"/>
<del cite="ymtye" title="wldaeawdi" onmouseover="fnk" id="sd" onmouseup="bfqxp" onkeyup="esyfhq">
<a tabindex="lcofhfti" href="ffuuebwn" title="jxhl" onkeydown="fsdwqt" rev="btbsuhl" onmouseup="zerecv" onkeyup="agwsyz" shape="htswqoew" onmousedown="ny" onclick="hq">
<object codetype="xbzmtvzd" onkeydown="ibsuthweoa" archive="ivav" onkeypress="sbhvtgvds" onmousemove="ll" onmousedown="kgbpgzj" onmouseout="nrpdnipw" classid="qwqzkzd" onclick="cybmhyab" usemap="aubjg"/>
</a>
</del>
</b>
Generation is allways different because we're using the
rand
function for many parts of the process.Code for this experiment can be found here.