Tuesday, March 10, 2009

Zettair is back alive

Hooray! Zettair folks down in Oz have revived the development. "in-development" version now online.

Let's see what we have (with a brutal diff -rc across the directories :-) Disclaimer: this is not necessarily a complete list, at least in the autoconf part - the diff was fairly boring and long.
Alternatively, I might have missed it in the set of whitelist changes or did not get a clue it was important...:)

configure script got what looks a 64-bit "kfreebsd". Probably inherited with a newer version of autotools, but an interesting beast nonetheless, if I read it well, it is here. GNU userland running on top of freebsd kernel.. The other tweaks belong to dragonflybsd. I heard about Hammer filesystem, but haven't tried the OS yet. Might be interesting. All depends on the H/w support.

index.h:

option INDEX_NEW_VOCAB is gone, INDEX_NEW_PARSEBUF (dictate how large the postings hashtable is) appeared.

INDEX_NEW_NO_OFFSETS - build the index without offsets. Probably not very interesting for me...

INDEX_NEW_QTHREADS - how many query threads to allow. Now I wonder if this allows for simultaneous query and indexing - that'd be sweet. My experiments with the previous version running separate processes were not terribly successful. With threads indeed it might be a different story.

some other similar parameter changes to index_load (as opposed to index_new)

index_cleanup is gone.

INDEX_SEARCH_DYNAMIC_RANK - provide the text describing the rank to be used as well as the rank parameters. Interesting, probably a bit too advanced for my needs.

autogenerated code for the metrics is obviously now parametrized.

mutexes for various pieces (For multithreading)

vocab_vector structure is simplified (looks like thanks to getting rid of multiple files)

a lot of multithreaded-oriented code

a hacko-fix in makeindex_append_docno to get around the fact that zettair can't handle docno-s bigger than a pagesize

vocabury operations seem to be simplified - previously the memory management was more sophisticated.

This completes the list - again, very probably I missed something important - if so, drop me a note in the comments.

0 comments: