Is searchmachine. He(it) has data which act on an participation (entrance), more they go in stand , are indexed.
The data are accessible to users on once behind their smack in pedestal even before indexation.
destroy on "advance " basically is possible(feasible ), for example, I can somewhat allow myself all data before entering within base on the black list of regular words where the most fraction of a spam resolve be eliminated. need : inside this parts self-direct be slightly the nifty data, and also willpower work durable on rise PCRE.
here is another, I would pronounce , more striking variant. slay of the data when they already are into an directory . Charm of a method consists to I need to accept only attributes of a spam since sums CRC32 of normal forms of vocabulary of various parts of the facts (heading, the text) plus to balance them to same arithmetic CRC32 of individuals data which I contain selected manually and which obviously are a spam. within this crate under the black register it is possible to make simply " the record of doubts ", i.e. the list which should subsist raked besides manually, to catch therefrom a spam.
Lack:
The spam is caught previously when inhabitants when he could get in dispatch, etc. could see it(him).
It is likely to act(land ) in another way, i.e. since always toward dodge and cross equally variants. On approach inside base toward filter beneath the black list, except to not delete, plus to spin in a sediment sink invisible to users, plus then each day a deposit bowl through means of hands plus sums CRC32 of normal forms of language , comparing by the equivalent sums of attributes.
Who will aid to counter my query ?
At once I shall make a reservation, that aksimet self-manage not rescue the minister of Russian democracy, I need to filter pretty harmless vacancies and the resume, the part since which is лохотроном, i.e. it not that viagraciolist-spam near which all have got used.
Who does not understand, about what arithmetic CRC32 there is a conversation, request to familiarize:
The device of mine searchingbattles with particular and unclear takes <http://users.livejournal.com/_yukko_/370337.html>
departure of the decision of a trouble :
The right decision on the association of the content-receiver which, apparently, but sooner otherwise later determination be realized, force be such:
1. All content gets initially within a dregs bowl;
2. inside a remains bowl starting him(it) sets shiling which will be worn elsewhere in moment base шинглов are deliberate ;
3. On sets shengl which are ended for evident spam announcements are spam announcements beginning a residue bowl;
4. spam announcements within a deposit bowl are marked, from all extra announcements the index is made;
5. Announcements for which the seek index is made, plus a guide leave for a beginning that gives advantage above current realizations: the content appears in stand already by the generated index!
6. Announcements which enclose remained into a deposit bowl are considered(examined) on a business false-positive which all over again for formation of primary support шинглов attributes false-positive operations, resolve be processed by hands.
7. behind processing put-on -affirmative the sticker "smapimng" is removed and for these announcements the item 4 and item 5 are carried out

|