Die mnoGoSearch Ergebnisse anpassen
mnoGoSearch füllt mit dem Indexer eine Word-Count Tabelle, bei mir mit ca 75.000 Zeilen. Und alle eindeutig indizierten Wörter sind klein geschrieben. Das kann ich so nicht brauchen.
Eigentlich brauche ich zwei Tabellen/Listen, eine zum Suchen wie bislang gehabt, und eine zum Erstellen einer späteren Schlagwortliste.
Dazu müssen aber klein geschriebene Wörter (Verben usw) kleinbleiben und groß geschriebene Substantive groß bleiben.
lower case / upper case problem
Frage aus 2007 (Version 3.3.5) :
( http://www.mnogosearch.org/board/message.php?id=19629 )
.
After indexing foreign web site, when I search for a word which contains lowercase "ı" character (forexample : "abcdefgh") it finds all the pages contains "abcdefgh" except contains uppercase word of this "ABCDEFGH".
>
> So the problem, I have many sites using uppercase words. How can I handle this kind of sites at the search process or may be index process?
.
mnoGoSearch converts "LATIN CAPITAL LETTER I" to "LATIN SMALL LETTER I".
.
You will most likely need to modify sources by changing lower case conversion according to for example foreign rules:
.
"LATIN CAPITAL LETTER I" -> "LATIN SMALL LETTER DOTLESS I"
"LATIN CAPITAL LETTER I WITH DOT ABOVE" -> "LATIN SMALL LETTER I"
The file to be modified is: src/unidata.c
Ob das für die 2011er 3.3.12 Version noch zutrifft, muß noch geprüft werden.
.
Hilfreiche Links - Jan 2013
Die Typo3 Extension vom April 2009 - danch keine Änderungen mehr
typo3.org/extension-manuals/mnogosearch/2.2.2/view/toc/0/
.
typo3.org/extension-manuals/mnogosearch/2.2.2/view/1/4/
.
www.mnogosearch.org/doc33/index.html
.
www.mnogosearch.org/doc33/msearch-indexing.html
www.mnogosearch.org/winhelp/ch04.html
.
Eine Installations-Beschreibung aus Slovenien
http://matrix.uni-mb.si/projekti/objava/vsebina/120/
You can list all parameters of indexer with:
/opt/mnogosearch/sbin/indexer -h
.
der Inhalt des mnogosearch 3.3.12 tar-files as html
fossies.org/unix/www/mnogosearch-3.3.12.tar.gz/
.
Noch eine Option für den Indexer
# Uncomment this line if you want to generate misspelled
# search word suggestions. You need to run "indexer -Ewrdstat"
# before using this feature.
#
Die Indexer Optionen der Version 3.3.9 -Hilfen
/opt/mnogosearch/sbin/indexer: invalid option -- 'H'
indexer from mnogosearch-3.3.9-mysql-DB2-solid-SAPDB-ibase-ctlib-freetds-oracle8-oracle8i
www.mnogosearch.org (C)1998-2009, LavTech Corp.
Usage: indexer [OPTIONS] [configfile]
Indexing options : (ich habe sie nicht ausprobiert)
-a | reindex all documents even if not expired (may be |
limited using-t,-u,-s,-c,-y and-f options) | |
-m | reindex expired documents even if not modified (may |
be limited using-t,-u,-c,-s,-y and-f options) | |
-e | index 'most expired' (oldest) documents first |
-o | index documents with less depth (hops value) first |
-r | do not try to reduce remote servers load by randomising |
url fetch list before indexing (-r recommended for very big number of URLs) | |
-n n | index only n documents and exit |
-c n | index only n seconds and exit |
-q | quick startup (do not add Server URLs);-qq even quicker |
-b | block starting more than one indexer instances |
-i | insert new URLs (URLs to insert must be given using-u or-f) |
-p n | sleep n seconds after each URL |
-w | do not ask for confirmation when clearing documents from the database |
-N n | run N threads |
----------------------- | |
Subsection control | options (may be combined): |
-s status | limit indexer to documents matching status (HTTP Status code) |
-t tag | limit indexer to documents matching tag |
-g category | limit indexer to documents matching category |
-y content-type | limit indexer to documents matching content-type |
-L | language limit indexer to documents matching language |
-u pattern | limit indexer to documents with URLs matching pattern |
(supports SQL LIKE wildcard '%') | |
--seed=number | limit indexer to docunents with the given seed (0-255) |
-D n | work with the n-th database only (i.e. with the n-th DBAddr) |
-f | filename read URLs to be indexed/inserted/cleared from file (with-a |
or-C option, supports SQL LIKE wildcard '%'; has no effect when combined with-m option) | |
-f- | Use STDIN instead of file as URL list |
Logging_options: | |
-l | do not log to stdout/stderr |
-v n | verbose level, 0-5 |
Misc. options: | |
-C | clear database and exit |
-S | print statistics and exit |
-j t | set current time for statistic (use with-S), |
YYYY-MM[-DD[ HH[:MM[:SS]]]] or time offset, e.g. 1d12h (see Period in indexer.conf) | |
-I | print referers and exit |
-R | calculate popularity rank |
-Ecreate | create SQL table structure and exit |
-Edrop | drop SQL table structure and exit |
-Eblob | create fast search index |
-Ewordstat | create statistics for misspelled word suggestions |
-Esqlmon | run interactive SQL monitor |
-Esqlmon | --exec=stmt execute the given SQL statement |
-F pattern | print compile configuration and exit, e.g.-F '*' |
-h,-? | print help page and exit |
-hh | print more help and exit |
-d configfile | use given configfile instead of default one. |
This option is usefull when running indexer as an interpreter, e.g.: #!/usr/local/sbin/indexer -d |
.
Please post bug reports and suggestions at www.mnogosearch.org/bugs/ - ist aber schonlange tot.
[XEN1-www11.ipw.net - root] /vol2/www/hifimuseum.de $
.
Include Files einbinden - wie ?
Claudio.Strizzolo schrieb in 2009 :
Basically I just added the following lines:
Affix it iso-8859-1 /path/italian.affix
Spell it iso-8859-1 /path/italian.dict
Affix en iso-8859-1 /path/english.aff
Spell en iso-8859-1 /path/british.dict
to a mnogosearch configuration file that I had specified in the Extra Configuration [IncludeFile] field of mnogosearch configuration, through the Ext manager.
Claudio