Can you define what chat spam is?
i.e. can you define a set of rules that when applied to one or more lines of chat, will tell you whether or not a person is chat spamming.
Skip to tl;dr
IntroIn theory, we all know
what it is. It's this:

But how do we know this? We look for key elements:
- Capital letters (the internet's equivalent to yelling)
- Repeating things (in case you didn't catch it the 1st time)
- Keyboard mashing (like you're playing SCGMD4)
- Bad words (90% of urban dictionary)
- Insults (could be a sarcastic compliment)
- etc...
That's all fine and dandy but what about the marginal cases?
Look At All The Caps I'm Using! Is This Considered Spam?
or how bout
The FBI, the CIA, and the MIB went to the YMCA for KFC and TLC.
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaamn daniel
Probably not worth a kick, right? (maybe that last one)
ParametersIf we were coding a Moderator Bot (the plan actually), what information can we get from a line of chat to help us classify it? Here's some I thought of:
- general
- length
- time since last message
- caps
- Number of caps
- Max number of consecutive caps
- Percentage of message that is caps
- repetition
- Same as last message?
- percent similarity of previous message
- max consecutive same message
- max consecutive same letter
- max nonconsecutive same letter
- Any pattern whatsoever
Now we just need a function to plug these variables into...
The PunishmentCongrats, you have detected that someone is spamming! What do we do?
- Permanently ban 'em
- Regularly ban 'em
- Kick 'em
- Mute 'em
- Warn 'em
- Ignore 'em
- Join 'em
A good system would probably detect the severity of the spam, and act accordingly.
Another option is to alter their text:
If they have too many caps, convert everything to lowercase.
IM ANGRY >>> im angry
If they spam a character, replace it with a single char.
I love booooooooooooobs!!!!!!!!!! >>> I love bobs!
If they use bad words, replace it with something more politically correct.
loving stuff >>> frickin' poop
But it's important to remember that some people just need to vent for a few seconds, and then they'll go back to being civilized. For example, a very difficult challenge server where you die just before the end and you spam in chat:
and then you get banned for spamming, even though you just spent the last hour there completely fine.
Real ExamplesI have about 200,000 lines of chat logged from Falling Tiles with plenty of cases of spamming and maybe not so spamming. Here's a few of them and numbered for... idk discussion purposes?
#1 - caps, similar to previous message
17:55:41 Zach505 33613 DON'T STOP, LOAD!
17:55:44 Zach505 33613 DON'T STOP NOWWW.
#2 - caps, obscenity, keyboard spam, you get the idea
17:55:50 Sarg3 41072 mondayS
17:55:51 Sarg3 41072 mondayS
17:55:52 Sarg3 41072 mondayStasduohsagdSAf
#321:30:06 Jeetlor 24924 ARE YOU SERIOUS NOOOO
21:30:08 Jeetlor 24924 ARE YOU SERIOUS NOOOO
#4 - correction of previous statement
22:01:14 Honno 42358 sugar /cmd joi
22:01:15 Honno 42358 sugar /cmd join
#523:06:19 T-MEX 17904 forget this
23:06:20 T-MEX 17904 forget thisssssss
#6 - funny, but caps and repeating self
17:22:36 Brickitect 24378 THAT MOMENT WHEN YOU WANNA TAKE A ONE loving HOUR NAP. BUT IT ENDS UP BEING 5! rip sleep schedule. *cries internally*
17:22:40 Brickitect 24378 THAT MOMENT WHEN YOU WANNA TAKE A ONE loving HOUR NAP. BUT IT ENDS UP BEING 5! rip sleep schedule. *cries internally*
#7 - patterned repetition (would have to research text pattern detection algorithms)
19:05:26 Racerboy 2245 \: ^ )
19:05:28 Racerboy 2245 \: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )
#8 - slightly different from previous statements
03:37:08 Tetris Trance 67477 hep
03:37:09 Tetris Trance 67477 help.
03:37:10 Tetris Trance 67477 help.
03:37:14 Tetris Trance 67477 help help help help
03:37:15 Tetris Trance 67477 help help help help!
03:37:16 Tetris Trance 67477 help help help help!!
03:37:16 Tetris Trance 67477 help help help help!!!
#9 - only a few caps and messages aren't the same as the last, but clearly spam
05:40:49 PC 34125 Nicki Minaj - Beez In The Trap
05:40:57 PC 34125 Nicki Minaj - Beez
05:40:58 PC 34125 Nicki Minaj - Beez In The Trap
05:41:00 PC 34125 Nicki Minaj - Beez
05:41:07 PC 34125 Nicki Minaj - Beez In The Trap
05:41:08 PC 34125 Nicki Minaj - Beez
05:41:11 PC 34125 Nicki Minaj - Beez In The Trap
05:41:12 PC 34125 Nicki Minaj - Beez
05:41:15 PC 34125 Nicki Minaj - Beez In The Trap
05:41:16 PC 34125 Nicki Minaj - Beez
05:41:17 PC 34125 Nicki Minaj - Beez In The Trap
#10 - just repeating one character for dramatic effect
12:16:50 Xenomorph 6745 hmmmmmmmmmmmmmmmmmmmmmmmmmmmm mmmmmmmmmmmmmmmmmmmmmmmmmmmmm mmmmmmmmmmmmmmmmmmmm
#11 - Sugar
22:12:09 Sugar 34476 trotor
22:12:18 Sugar 34476 trogtor
22:12:20 Sugar 34476 trogtor make
22:12:22 Sugar 34476 trogtor make sever
#12 - keyboard mash (how do you detect this?!)
17:29:24 321 race 40821 tperoyepotht4[khrptojhhtrjhrthrthjthtohjtohtohtrjhrtpohjrthjohjorjphjrtphjrtpjhrtpohrtp
#13 - funny, but caps and repetition
17:06:02 Walter H. White 38953 HOW DO I MAKE METHAMPHETAMINE?!?!?!??!?!?!?!?!??!?!?!?!?!?!?!??!?!?!?!??!?!?!?!?!?!?!??!!?!?!??!?!?!?!?!
#14 - slightly annoying, but i wouldn't consider it spamming
00:07:55 Frosty1995 94560 Dont You Hate It If People Talk Like This?
#15 - caps, sarcasm compliment
It really boils down to the intention. Is it possible to detect maliciousness?
If chat spam is kind of a gray area, how would a chat moderator bot work?
Comment on how you approach this, how it would handle some of the examples, examples of your own, ideas, clarifications, improvements, how lame this sounds, how way too big this seems, stories of times you had to make a tough call...
...or you could just start spamming and see where that gets you.