[Teammetrics-discuss] Observations from current run of `archiveparser.py`.

Andreas Tille andreas at an3as.eu
Sat Feb 4 22:33:43 UTC 2012


Hi Sukhbir,

On Sun, Feb 05, 2012 at 12:44:12AM +0530, Sukhbir Singh wrote:
> ...
> (As of writing this, there are 22 messages that have been flagged with
> the keywords, but they are all OK)

Seems I might have been to harsh in my first SPAM fighting effort.
Please change in a way that we prefer rather having some SPAMs
undetected but all real mails are in our listarchives table.

> That is fine. But once a message is flagged as spam, it *does not*
> populate the `listarchives` table. We just populate `listspam` and
> move on.

In principle this is right if we can be sure that a message is really
SPAM.  Please remove all these keywords from our list of potential
SPAM keywords which generate false positives.

> Because we have worked hard on this (and aimed for
> perfection), I want to save every message possible. And right now, we
> are losing lots of messages.

You are definitely right that this should not happen.

> What I suggest is this -- populate `listspam` *and also* populate
> `listarchives`. That way we serve both purposes: help the spam
> fighting effort and not lose any messages. I had discussed this
> earlier but you were not comfortable about it, but given that we are
> losing so many genuine messages, I thought I had bring this up again.
> Have a look at the log file and you will make up your mind hopefully
> about this!

I'm not fully convinced that just dropping those keywords from our list
of SPAMy keywords would not have a similarly helpful effect.  But I will
not insist on my opinion if you ar enot comfortable about this - so if
you really feel better to have all messages in the listarchives table I
will not stop you to do so.  However, in this case it does not make any
sense to simply keep copies in the listspam table.  We should rather add
a spam flag to the list archives table instead of stupidly copying data.

> PS: Summer of Code 2012 was just announced. Who knows, we might get a
> student :)!

I do not think that it works for this topic again.  If you have some
other ideas feel free to propose.

Kind regards

       Andreas.
 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list