[Pkg-dspam-misc] Bug#366478: dspam: Documentation and Integration

Tue May 9 00:30:50 UTC 2006

Package: dspam
Version: 3.6.4-4
Severity: important

Ok, I installed dspam, dspam-webfrontend, dspam-doc, libdspam7-drv-sqlite3.
Now what?    Understand that I have experience using an older version of dspam
on a redhat box, but I am at a loss as to how to proceed in the debian (sarge + some sid) customized version.   README.Debian is of no help.    It doesn't tell you what options the program was compiled with.   

Dspam bugs, dspam documentation bugs, and dspam debian package bugs all 
compound to make a real mess.

The configuration I am trying to use is exim->maildrop->dspam, and I tried
both standalone and server modes.

No link was provided from the DSPAM website wiki to the debian package maintainers web page.    Nor does that web page have useful info.   It just says that there is great documentation at the dspam site, which is definitely not true.

The problems I have had indicate some areas that need to be documented.
   - How do you tell dspam to initialize the database for a user?   
     Apparently this is automatic (it wasn't in older versions) if you
     use one of the simple (berkley db or sqlite) databases when
     dspam actually does something - which it usually doesn't, but
     the docs need to say this.  
     Database was automagically created when I did a dspam_stats 
    (real or bogus user).   Typos create new users.   But a database
    wasn't created if I tried to use the "dspam" or "dspam_corpus" commands

   - Training fails silently - actually, it verbosely indicates that it
     is actually doing something
        dspam_corpus --spam <mailbox>
        command: '/usr/bin/dspam' --class=spam --source=corpus  --user 'whitis'
       /usr/bin/dspam_corpus: 2788 messages, 00:00:27 elapsed, 103.26 msgs./sec.
       This is true whether the training is done as ordinary user or root.

       meanwhile /var/spool/dspam/data/local/whitis/whitis.sdb is only
       8192 bytes and dspam_stats reports all zeros.

   - many of the utilities don'trun as an ordinary user
     dspam_stats should run for ordinary users, for example.

   - web frontend doesn't run (you see the text of the cgi's)
     with all the configuration directories on apache2 instead of
     a monolithic file, it should be possible to make http://localhost/dspam/
     a cgi direcory on package install.  Maybe 
     /etc/apache2/sites-availible/default needs to have a "Include
     /etc/apace2/sites-availible/default.d/" to
     include directorys.   I tried symlinking the /etc/dspam/dspam-apache2.conf
     file into /etc/apache2/sites-enabled but apache died due to the 
     suexecusergroup statement.    Aparently, dspam package failed 
     to insert into /etc/apache2/mods-enabled/ files to start 
     suexec (should this be a separate debian package so it doesn't 
     conflict with other attempts to start the module)?
     Note that suexec doesn't appear to play niceley with others anyway, 
     since you can't define it on a per directory context.    
     I.e. /dspam runs as /dspam /analog runs as analog, etc.)
     This might be a step in the right direction:
       echo "LoadModule suexec_module /usr/lib/apache2/modules/mod_suexec.so" >> /etc/apache2/mods-enabled/suexec.load
     Given the limitations of apache suexec, dspam probably needs to
     create a virtualhost on a different port number.
        <VitualHost localhost:1234>

  - It appears that you intend that people run dspam in client server mode.
    README.Debian needs to point the user to the /etc/defaults/dspam to
    initialize the server.    And the dspam configuration files need to be
    initialized so the client and server can actually talk.   It appears
    that secret authentication tokens are needed, but that is clear as mud.
    The package could poke a randomly generated token into the files when
    it creates them.   
         token=`dd if=/dev/random bs=512 count=1 | md5sum`
    presumably, the token needs to be set in two places in the config
    file, but where is the second place?  
    dspamc --user whitis --classify says "... unable to authenticate client"

  - world permissions on /etc/dspam/dspam.conf needs to be set so dspamc
    can read file?

  - How does an individual user set options like opt-in?
    do ~/.dspam files work?   That depends on how you compiled dspam and
    the configuration files.   The dspam docs mention that much.   But they
    don't actually tell you what the options are to enable or disable .dspam
    files.   So, how does an ordinary user set opt-in?    It isn't like
    it is adequately documented at the DSPAM website.

       - touch /var/dspam/opt-in/local/user.whitis
         permission denied

       - dspam_admin add preference whitis optin yes

         "Program mode requires special privileges, e.t. root or Trusted User"

         dspam_admin sorta works as an ordinary user but if you give
         an incorrect option like "add preference whitis bogons yes"
         you get an error message like this:
           Unable to open file for writing: 
             /var/spool/dspam/data/local/whitis/whitis.prefs.bak:
         So, that doesn't inspire confidence.   Is "optin" vs "OptIn", "optIn",
         or "opt-in" correct?

       - touch ~/.dspam 
         depends on whether homedirs is set, but the DSPAM site doesn't
         bother to tell you where one sets this so you can even check how
         it is set.   And strace shows that program is not checking
         ~/.dspam, which is probably a serious mistake since there
         is no other way an ordianry user can set opt-in,opt-out
         due to permissions.

  - how does a user configure between:
     - do not run dspam at all
     - do not run dspam as an Exim filter rule so they can
       run it from within maildrop (called from .forward)?    
       This is trickier if you have
       a filter rule installed since it appears if you opt-out, dspam
       simply fails silently even if it wasn't invoked by the MTA.
     - run dspam as Exim filter rule.
    It appears there is a serious design flaw in dspam whereby
    it ignores all commands based on opt-in/opt-out status rather
    than invoking it with a special option, such as "dspam --check-optinout",
    when calling it from an MTA filter rule rather than maildrop, procmail,
    manual command, dspam_corpus, or some other program that is using it 
    to clasify mail.

  - dspam run as ordinary user gets permission error on /var/spool/dspam/data/local/whitis

  - Since DSPAM appears to be set up for opt in, one could make the dspam
    package actually install working dspam support that works as soon as
    a user opts in.    At least if a suitable parameter is set somehwere
    with something like update-alternatives (i.e. we need to manipulate
    whether there is a symbolic link into the Exim (and other MTAs) filter
    directories).

  - No MTA integration, not even with the default Exim MTA.
    Given the separated configuration files for Exim, one could include
    the necessary setup file for the filter rule.

  - problems of program failing silently were the same whether 
    using sqlite or hash database backend

  - if you really intend people use client/server mode doesn't that
    require changing "dspam" to "dspamc" in the sample exim configs?

  - dspam does not seem to work in either standalone or client/server mode.

  - with email notifications on: 
   "Unable to open file for reading: firstrun.txt: No such file or directory"

  - creating /var/spool/dspam/data/opt-in/local/user.whitis
    made a difference, in spite of the fact that an ordinary user can't do 
    that and /var/run/debug/clearly shows that it read the prefence
    set with dspam_admin:
       29813: [05/06/2006 13:21:13] Loading preference 'optin' = 'yes'
    Now, when run as whitis I get
        Unable to create direcotry: /var/spool/dspam/data/local/whitis: Permission denied
    And as root, it actually clasifies the spam 
       dspam --user whitis --classify --debug --stdout

  - what do they mean by optin in the dspam_admin preferences?
    Does that mean that you are opted in, as you would expect or that
    you must touce the file in the optin/local directory?

  - Fixing permission problems:
     chown whitis /var/spool/dspam/data/local/whitis 
     chmod o+rx /var/spool/dspam/
     chmod o+rx /var/spool/dspam/
     chmod o+rx /var/spool/dspam/data/
     chmod o+rx /var/spool/dspam/data/local/
    But when I do an operation (as whitis) such as dspam_corpus that
    requires writing to database, I get:
      query error: attempt to write a readonly database: see sql.errors for more details
      Unable to open file for writing: /var/log/dspam//sql.errors: Permission denied

     chown whitis.dspam /var/spool/dspam/data/local/whitis/*
     chmod ug+rw /var/spool/dspam/data/local/whitis/* 

     but the program creates whitis.sdb-journal later which doesn't have
     group write permission which could cause trouble later.
    chmod ug+rw /var/spool/dspam/data/local/whitis/* 

  - performance
    3000Mhz Amd athlon 64, 1GB of dual channel PC3200 ram 
    with dspam_corpus, I am seeing very low performance of about 0.25 message
    per second.   This was improved some by killing the unusepd dspam deamon.
    The problem seems to be related to sqlite database locking as running
    dspam on a test message with strace shows that the program pauses
    on an lseek() on the database file shortly after opening.
    Now, processing a message sometimes takes about a tenth of a second
    but other times (dspam_corpus running in background but no inbound
    mail filtering) it takes 1-2 seconds.   A quick benchmark with formail
    on a mailbox of 217 messages shows throughput of 4.5 messages per second.
    Dspam_corpus is still showing rates well under 0.5 msgs per second 
    but it was in the middle of a 200MB spam folder when I killed the daemon,
    so the long term average can be confusing things.  Actually, I killed
    it in the middle, and dspam_corpus is still only reporting 0.2 messages
    per second.   CPU load is under 10%.   Disk activity is negligable (the
    entire database is sucked into the disk cache).   Apparently, training
    operations (database writes) are much slower than read only classify 
    operations.   With dspam_corpus running in background:
       cat /tmp/spam1 |  time dspam --user whitis --class=spam --source=corpus
    takes 4.26 seconds.   But:
       cat /home/whitis/mail/prism | time formail -s dspam --user whitis --class=innocent --source=corpus
    was killed after an hour (it made dspam_corpus run slower too)
       cat /tmp/spam1 |  time dspam --user whitis --class=innocent --source=corpus
    took 7.5 seconds and it took 7.9 to retrain that message as spam.  
    Classifying that same message took 0.84 seconds

    running a script that runs dspam_corpus on most of my mailboxes (spam and
    innocent, probably about 2gb), has taken over 24 hours.

    corpus training was very slow (and mail filtering was slow, too) on
    my old box but it was about a tenth as fast and was using berkley db
    library (which also crashed).

    Note that dspam was somewhat troublesome when I used it before:
      - slow training
      - repeated database (berkely DB) corruption caused mail bounces

I am currently looking at other spam filters.    spamassasin is able to train
5-100msgs per second (not sure why such a wide spread, but there appears to be overhead on small mailboxes) but it also appears to depend on content.

-- System Information:

Debian Release: 3.1
  APT prefers unstable
    odd, since /etc/apt/apt.conf says:   APT::Default-Release "stable";
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-2-386
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)

Versions of packages dspam depends on:
ii  adduser                       3.63       Add and remove users and groups
ii  libc6                         2.3.6-7    GNU C Library: Shared libraries
ii  libdspam7                     3.6.4-4    DSPAM is a scalable and statistica
ii  libldap2                      2.1.30-8   OpenLDAP libraries
ii  procmail                      3.22-11    Versatile e-mail processor

-- no debconf information