Neptune Web Neptune Web

Implementing Generic Site Search using mnoGoSearch

I wanted to provide a quick but accurate search on a website. In the past we've used programs such as htdig and swish, which while good, became outdated or difficult to set up the way we wanted. My latest attempt includes implementing mnogosearch.

Installation

For our purposes we used the latest version of mnogosearch and downloaded it here:

wget http://www.mnogosearch.org/Download/mnogosearch-3.3.9.tar.gz

After unpacking I configure it to use mysql database only and set a prefix where everything will be installed:

./configure --with-mysql --prefix=/neptune/mnogosearch-3.3.9

Note that on a Red Hat Enterprise Linux AS release 4 with the x86_64 architecture the configure script wasn't able to find required mysql libraries. Therefore we need to provide the path in the environment variable and invoke configure like this:

LDFLAGS=-L/usr/lib64/mysql ./configure --with-mysql --prefix=/neptune/mnogosearch-3.3.9

Setup

Next step is to create a database in mysql that will be used by the site search. In mysql you simply need to create a database and grant full access to it to a user of your choice. In the example below we create a database called "mnogosearch_neptuneweb" and give full access to it to the user called "neptuneuser" with the password "neptunepass". Full database access will be needed for the mnogosearch indexer to create database tables.

mysql> create database mnogosearch_neptuneweb;
mysql> grant all on mnogosearch_neptuneweb.* to neptuneuser@localhost identified by 'neptunepass';

Configuration

We'll need to have two configuration files for the search to work: one for the indexer and another for the search.

The indexer configuration file called "indexer.conf" can be as simple as in the following example:

DBAddr mysql://neptuneuser:neptunepass@localhost/mnogosearch_neptuneweb/?dbmode=single
Server http://www.neptuneweb.com/
Disallow *.gif *.jpg *.jpeg *.png *.css *.js *.flv *.swf
Section body 1 150000 html
Section title 2 255
LocalCharset UTF-8
RemoteCharset UTF-8

The other configuration file is called "search.htm" and is basically a set of html code portions to use on the search and results page. It is recommended to create a copy of the search.htm-dist file provided with mnogosearch as /neptune/mnogosearch-3.3.9/etc/earch.htm-dist and update it to your needs. The only required modifications to make it work is the DBAddr parameter to point it to the correct mnogosearch database. It has to match the same parameter in the indexer.conf file.

Running it

To index the site you need to invoke the indexer program three times as below: first to drop any old database tables, then to create new database structure and finally to index the website:

/neptune/mnogosearch-3.3.9/sbin/indexer -Edrop indexer.conf
/neptune/mnogosearch-3.3.9/sbin/indexer -Ecreate indexer.conf
/neptune/mnogosearch-3.3.9/sbin/indexer -a -d indexer.conf

On the search page we need to invoke the mnogosearch search.cgi program to return results. Below is an example of php code for the search page. This will take the search.htm configuration template and output the search form and search results.

$queryString = $_SERVER["QUERY_STRING"];
putenv("QUERY_STRING=" . $queryString);
putenv("UDMSEARCH_TEMPLATE=" . "/path/to/search.htm");
putenv("UDMSEARCH_SELF=" . $_SERVER["PHP_SELF"]);
$cmd = "/neptune/mnogosearch-3.3.9/bin/search.cgi";
exec($cmd, $searchOutput);
for($searchIndex=1; $searchIndex<count($searchOutput); $searchIndex++) {
$line = $searchOutput[$searchIndex];
print $line;
}

The following are two examples of rendering the results on the page:

Conclusions

It takes a bit of work to set up and configure the search this way but once you do it once it's just a matter of making a copy next time. However, the mnogosearch program is very powerful and allows to create a very custom search engine. Worth noting are some advanced features including indexing database content directly, ability to use external parsers, categorizing content by site sections, support for indexing content in multiple languages.

 

You May Also Be Interested In:

comments powered by Disqus

Warning: Cannot modify header information - headers already sent by (output started at /home/neptuneweb.com/html/blog/template.html:101) in /neptune/cm/webtools.o2081/nw_live/core/analytics/internal/nwtrackinguser.class.php on line 115

Warning: Cannot modify header information - headers already sent by (output started at /home/neptuneweb.com/html/blog/template.html:101) in /neptune/cm/webtools.o2081/nw_live/core/analytics/internal/nwtrackinguser.class.php on line 115