I wanted to create something that loads the XSD Schema into a object structure that can be queried in order to the determine the elements that will be generated. I didn't look for a existing library that does this because it will be a much better exercise to try to build it myself. However creating something that support the full XSD specification like this or this is a HUGE task so I chose to support only a small subset of it.
For XML parsing and generation, I'm using REXML which is a very nice library for XML manipulation.
The basic strategy for loading the XSD Schema is to create a collection of classes that handles each part of the supported schema features. For example
SchemaElement for supporting element declarations and SchemaComplexType was created for supporting the  complexType declarations. Since an XSD Schema is a common XSD document loading each element is done by using a
load_from , for example for SchemaElement the load_from method looks like this:
class SchemaElement
   ...
   def load_from(elementDefinition,prefixes)
       
      @name = elementDefinition.attributes["name"]
      if (elementDefinition.attributes["type"]) then
          @element_type = Reference.new(elementDefinition.attributes["type"],prefixes)
      end          
      if (elementDefinition.attributes["substitutionGroup"]) then
          @substitution_group = Reference.new(elementDefinition.attributes["substitutionGroup"],prefixes)
      end          
      elementDefinition.find_all {|e| !e.is_a?(REXML::Text)}.each{|e|
           case e.name
               when "complexType"
                   ct = SchemaComplexType.new
                   ct.load_from(e,prefixes)
                   @element_type = ct
               else
                   print ""Warning: ignoring #{e}"
           end
      }
   end
...
end
As shown in the
load_from method, there're relationships between schema elements, for example the type of the element could be a type defined elsewhere inside this schema or an imported schema. Once the schema is loaded, there's a process that takes the references and replace them with a real reference to the object. For the SchemaElement the solve_references_method looks like this:
class SchemaElement
  ...
  def solve_references(collection)
     if @substitution_group.is_a? (XSDInfo::Reference) then
        @substitution_group = collection.get_type(
                                   @substitution_group.namespace,
                                   @substitution_group.name)
     end
     if @element_type.is_a?(XSDInfo::Reference) then
        if(r = collection.get_type(@element_type.namespace,@element_type.name)) then
           @element_type = r
        else
          print "Not found #{@element_type.namespace}.#{@element_type.name}\n"
        end
     else         
        if !@solving then
           @solving = true
           @element_type.solve_references(collection) unless @element_type == nil
           @solving = false
        end
     end
   end
  ...
end
Here
collection points to a SchemaCollection object that holds all the loaded schemas.Having all this we can load an XSD Schema and start querying for its parts, for example, we can get the list of attributes that apply to the b tag in the XHTML schema:
$ irb -r xsd/xsd.rb
irb(main):001:0> sc = XSDInfo::SchemaCollection.new
=> #<XSDInfo::SchemaCollection:0xb7b71170>
irb(main):002:0> sc.add_schema XSDInfo::SchemaInformation.new("../xhtml1-strict.xsd")
irb(main):003:0> sc.namespaces.each {|ns| sc[ns].solve_references sc}
=> ["http://www.w3.org/1999/xhtml"]
irb(main):004:0> sc["http://www.w3.org/1999/xhtml"].elements["b"].all_attributes.collect {|x| x.name}
=> ["onkeydown", "onkeypress", "onmouseover", "onkeyup", "onmousemove", "onmouseup", "ondblclick", "onmouseout", "onmousedown", "onclick", "title", "class", "id", "style", "dir", nil, "lang"]
Now, for generating the XML sample we can create a
generate_sample for each part of the schema. For example the generate_sample for the SchemaComplexType looks like this:
## Sample Generation
def generate_sample_content(e,context)
  atts = all_attributes.select {|x| x.name != nil && rand > 0.7}
  atts.each {|att|
    sample_length = 1 + (10*rand).to_i
    sample_text = (1..sample_length).to_a.collect{ |p| 
        ltrs = ("a"[0].."z"[0]).to_a
        ltrs[(ltrs.length*rand).to_i]
       }.pack("c"*sample_length)
    e.attributes[att.name] = sample_text
  } 
  self.all_content_parts.each {|p| p.generate_sample_content(e,context)}
end
The value of the attributes must be valid according to its simple type. However this is not supported right now.
Another example for the
generate_sample method for the SchemaChoice class is the following:
def generate_sample_content(e,context)
   if (@minOccurs == 1 && @maxOccurs == 1) then
     element_to_gen = @elements[(rand*@elements.length).to_i]
     element_to_gen.generate_sample_content(e,context)
   elsif (@minOccurs == 0 && @maxOccurs == 1) then
     element_to_gen = @elements[(rand*@elements.length).to_i]
     element_to_gen.generate_sample_content(e,context) unless rand < 0.5    
   elsif (@maxOccurs == "unbounded") then
      (1..(rand * 4).to_i).each {|i|
           element_to_gen = @elements[(rand*@elements.length).to_i]
           element_to_gen.generate_sample_content(e,context) unless rand < 0.5             
      }
   end          
end
Now with all this infrastructure we can generate some sample XML files:
def generate_sample_html_element name
    sc = XSDInfo::SchemaCollection.new
    sc.add_schema XSDInfo::SchemaInformation.new("../xhtml1-strict.xsd")
    sc.namespaces.each {|ns| sc[ns].solve_references sc}
    doc = REXML::Document.new
    f = File.new("output.xml","w")
    doc.elements << sc[sc.namespaces[0]].elements[name].a_sample
    doc.write(f,3,false,false)
    f.close
    return sc
end
We call:
irb(main):006:0> generate_sample_html_element "b"
Generates:
<b class="zlxzzyunen" onkeydown="uaqz" onkeypress="kqyqmqn" onmouseover="sevcgov" onkeyup="ezglfa" lang="ckn" ondblclick="gfaskd" onmousedown="jwed" onclick="m">
  <script/>
  <del ondblclick="xeepat"/>
  <del cite="ymtye" title="wldaeawdi" onmouseover="fnk" id="sd" onmouseup="bfqxp" onkeyup="esyfhq">
    <a tabindex="lcofhfti" href="ffuuebwn" title="jxhl" onkeydown="fsdwqt" rev="btbsuhl" onmouseup="zerecv" onkeyup="agwsyz" shape="htswqoew" onmousedown="ny" onclick="hq">
      <object codetype="xbzmtvzd" onkeydown="ibsuthweoa" archive="ivav" onkeypress="sbhvtgvds" onmousemove="ll" onmousedown="kgbpgzj" onmouseout="nrpdnipw" classid="qwqzkzd" onclick="cybmhyab" usemap="aubjg"/>
    </a>
  </del>
</b>
Generation is allways different because we're using the
rand function for many parts of the process.Code for this experiment can be found here.