Douglas Thrift ([info]douglaswth) wrote,
@ 2004-06-04 16:36:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Current mood: giddy

operator + overloading == HOT
Last night, after trying to figure out why Xalan-C++ 1.8 was not behaving when used in my SiteMapper program on Windows, [info]saurik showed me his syntax for XPath that is in his Menes C++ Library Thingy: *document/"page"/"section"/"list". Yes, that's the division operator used with XML nodes and strings, hot! So rather than trying more with evil and verbose Xalan/Xerces XPath, I decided to change my program to use his syntax. This resulted in a change from code like this:

    XalanSourceTreeInit init;
    XalanSourceTreeDOMSupport support;
    XalanSourceTreeParserLiaison liaison;
    XPathEvaluator evaluator;

    support.setParserLiaison(&liaison);

    XalanDOMString file(siteMap.c_str());
    LocalFileInputSource source(file.c_str());

    XalanDocument* document = liaison.parseXMLStream(source);

    if (document == 0) return;

    XalanNode* list = evaluator.selectSingleNode(support, document,
        XalanDOMString("/page/section/list").c_str());

    if (list == 0) return;

    comment << evaluator.evaluate(support, document,
        XalanDOMString("comment()").c_str())->str();

to clean and simple code like this:
    ext::Handle<xml::Document> document(xml::Parse(siteMap));
    ext::Handle<xml::Node> list(*document/"page"/"section"/"list");

    comment = *document/"comment()";

which does basically the same thing (comment was changed from an ostringstream to just a string).
We then went on to get it working right and solve the fun link errors on FreeBSD.



(Post a new comment)

Small mistake: templates got eaten.
[info]saurik
2004-06-04 06:45 pm UTC (link)
Note that the templates got lost in that code:

ext::Handle<xml::Document> document(xml::Parse(siteMap));
ext::Handle<xml::Element> list(*document/"page"/"section"/"list");

comment = *document/"comment()";

(Reply to this) (Thread)

Re: Small mistake: templates got eaten.
[info]douglaswth
2004-06-04 06:55 pm UTC (link)
Ack, fixed it.

(Reply to this) (Parent)

sitemappers
[info]wiz
2004-06-04 07:05 pm UTC (link)
Hey Douglas!
I'm confused as to why people are moving this way... using XML as intermediary database formats... The site map is stored in the runtime state of your searchengine, right? Couldn't there be a simple function within the searchengine that would output the sitemap if requested, rather than parsing the XML representation of its state? Why not generate sitemaps from within the searchengine before it finishes parsing some site and serializes its data to some xml format?

I guess I'm confused to how a searchengine doesn't #include everything a sitemapper needs, and thus is really just a sitemapper that also maintains indexes of content as well as filenames.

(Reply to this) (Thread)

Re: sitemappers
[info]douglaswth
2004-06-04 08:03 pm UTC (link)
Hello Seth!

The SiteMapper is a program that I recently created for just my website, it creates the Sitemap page according to some rules that are specific to my site. The sitemap has an ordered tree structure that needs to be maintained across updates, while my Search Engine's index is just a list of webpages which are only ordered by how their hyperlinks where thrown in a queue and where the indexer started.

What my SiteMapper does first is parse the old sitemap so it gets the order and structure, then it parses the index file that my Search Engine creates and uses so it can update the pages already in the sitemap structure and keeps any new pages that match my criteria, and, finally, it goes through the accepted new pages and adds them into the tree structure and rewrites the sitemap file.

(Reply to this) (Parent)(Thread)

Re: sitemappers
[info]wiz
2004-06-04 08:27 pm UTC (link)
Cool. Only confused about one thing. The first step is to parse "the old sitemap". This is one previously generated by the SiteMapper and nonexstnat if it hasn't been run yet?
Or one that's been generated by the search engine?

If it can do it all the first time, without a previous index of some kind, I'm confused as to why you wouldn't regenerate the whole thing each time. I guess it'd be less efficient, but its probably the lazy man's (ie- me) solution ;)

(Reply to this) (Parent)(Thread)

Re: sitemappers
[info]douglaswth
2004-06-04 09:18 pm UTC (link)
The old sitemap was either previously generated by the SiteMapper or first created by hand. The SiteMapper needs the previous sitemap to get the order that pages are in the tree structure, new pages are always added after the other pages in their branch. The order of the sitemap tree is my order of the subdomains at the root level and chronilogical within.

The Search Engine and the SiteMapper are seperate things. The Search Engine regenerates its index every time it indexes. The SiteMapper needs its previous information and only uses the Search Engine's index to get new information and changes.

(Reply to this) (Parent)


Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…