What is chat spam?

Author Topic: What is chat spam?  (Read 4461 times)

Can you define what chat spam is?

i.e. can you define a set of rules that when applied to one or more lines of chat, will tell you whether or not a person is chat spamming.

Skip to tl;dr


Intro
In theory, we all know what it is. It's this:



But how do we know this? We look for key elements:
  • Capital letters (the internet's equivalent to yelling)
  • Repeating things (in case you didn't catch it the 1st time)
  • Keyboard mashing (like you're playing SCGMD4)
  • Bad words (90% of urban dictionary)
  • Insults (could be a sarcastic compliment)
  • etc...

That's all fine and dandy but what about the marginal cases?

Quote from: Whirlwind
Look At All The Caps I'm Using! Is This Considered Spam?
or how bout
Quote from: Whirlwind
The FBI, the CIA, and the MIB went to the YMCA for KFC and TLC.
orrr
Quote from: Whirlwind
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaamn daniel

Probably not worth a kick, right? (maybe that last one)


Parameters
If we were coding a Moderator Bot (the plan actually), what information can we get from a line of chat to help us classify it? Here's some I thought of:

  • general
    • length
    • time since last message
  • caps
    • Number of caps
    • Max number of consecutive caps
    • Percentage of message that is caps
  • repetition
    • Same as last message?
    • percent similarity of previous message
    • max consecutive same message
    • max consecutive same letter
    • max nonconsecutive same letter
    • Any pattern whatsoever

Now we just need a function to plug these variables into...


The Punishment
Congrats, you have detected that someone is spamming! What do we do?

  • Permanently ban 'em
  • Regularly ban 'em
  • Kick 'em
  • Mute 'em
  • Warn 'em
  • Ignore 'em
  • Join 'em

A good system would probably detect the severity of the spam, and act accordingly.

Another option is to alter their text:
If they have too many caps, convert everything to lowercase.
Quote from: Whirlwind
IM ANGRY    >>>    im angry
If they spam a character, replace it with a single char.
Quote from: Whirlwind
I love booooooooooooobs!!!!!!!!!!    >>>    I love bobs!
If they use bad words, replace it with something more politically correct.
Quote from: Whirlwind
loving stuff    >>>    frickin' poop

But it's important to remember that some people just need to vent for a few seconds, and then they'll go back to being civilized. For example, a very difficult challenge server where you die just before the end and you spam in chat:
Quote from: Whirlwind
FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUCK ME
FUUUUUUUUUUUUUUUUUUUUUUUUUUUU UCK ME
FUUUUUUUUUUUUUUUUUUUUUUUUUUUU UCK ME
and then you get banned for spamming, even though you just spent the last hour there completely fine.

Real Examples
I have about 200,000 lines of chat logged from Falling Tiles with plenty of cases of spamming and maybe not so spamming. Here's a few of them and numbered for... idk discussion purposes?

#1 - caps, similar to previous message
Quote
17:55:41 Zach505 33613   DON'T STOP, LOAD!
17:55:44 Zach505 33613   DON'T STOP NOWWW.
#2 - caps, obscenity, keyboard spam, you get the idea
Quote
17:55:50 Sarg3 41072   mondayS
17:55:51 Sarg3 41072   mondayS
17:55:52 Sarg3 41072   mondayStasduohsagdSAf
#3
Quote
21:30:06 Jeetlor 24924   ARE YOU SERIOUS NOOOO
21:30:08 Jeetlor 24924   ARE YOU SERIOUS NOOOO
#4
Quote
- correction of previous statement
22:01:14 Honno 42358   sugar /cmd joi
22:01:15 Honno 42358   sugar /cmd join
#5
Quote
23:06:19 T-MEX 17904   forget this
23:06:20 T-MEX 17904   forget thisssssss
#6 - funny, but caps and repeating self
Quote
17:22:36 Brickitect 24378   THAT MOMENT WHEN YOU WANNA TAKE A ONE loving HOUR NAP.  BUT IT ENDS UP BEING 5!  rip sleep schedule. *cries internally*
17:22:40 Brickitect 24378   THAT MOMENT WHEN YOU WANNA TAKE A ONE loving HOUR NAP.  BUT IT ENDS UP BEING 5!  rip sleep schedule. *cries internally*
#7 - patterned repetition (would have to research text pattern detection algorithms)
Quote
19:05:26 Racerboy 2245   \: ^ )
19:05:28 Racerboy 2245   \: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )\: ^ )
#8 - slightly different from previous statements
Quote
03:37:08 Tetris Trance 67477   hep
03:37:09 Tetris Trance 67477   help.
03:37:10 Tetris Trance 67477   help.
03:37:14 Tetris Trance 67477   help help help help
03:37:15 Tetris Trance 67477   help help help help!
03:37:16 Tetris Trance 67477   help help help help!!
03:37:16 Tetris Trance 67477   help help help help!!!
#9 - only a few caps and messages aren't the same as the last, but clearly spam
Quote
05:40:49 PC 34125   Nicki Minaj - Beez In The Trap
05:40:57 PC 34125   Nicki Minaj - Beez
05:40:58 PC 34125   Nicki Minaj - Beez In The Trap
05:41:00 PC 34125   Nicki Minaj - Beez
05:41:07 PC 34125   Nicki Minaj - Beez In The Trap
05:41:08 PC 34125   Nicki Minaj - Beez
05:41:11 PC 34125   Nicki Minaj - Beez In The Trap
05:41:12 PC 34125   Nicki Minaj - Beez
05:41:15 PC 34125   Nicki Minaj - Beez In The Trap
05:41:16 PC 34125   Nicki Minaj - Beez
05:41:17 PC 34125   Nicki Minaj - Beez In The Trap
#10 - just repeating one character for dramatic effect
Quote
12:16:50 Xenomorph 6745   hmmmmmmmmmmmmmmmmmmmmmmmmmmmm mmmmmmmmmmmmmmmmmmmmmmmmmmmmm mmmmmmmmmmmmmmmmmmmm
#11 - Sugar
Quote
22:12:09 Sugar 34476   trotor
22:12:18 Sugar 34476   trogtor
22:12:20 Sugar 34476   trogtor make
22:12:22 Sugar 34476   trogtor make sever
#12 - keyboard mash (how do you detect this?!)
Quote
17:29:24 321 race 40821   tperoyepotht4[khrptojhhtrjhrthrthjthtohjtohtohtrjhrtpohjrthjohjorjphjrtphjrtpjhrtpohrtp
#13 - funny, but caps and repetition
Quote
17:06:02 Walter H. White 38953   HOW DO I MAKE METHAMPHETAMINE?!?!?!??!?!?!?!?!??!?!?!?!?!?!?!??!?!?!?!??!?!?!?!?!?!?!??!!?!?!??!?!?!?!?!
#14 - slightly annoying, but i wouldn't consider it spamming
Quote
00:07:55 Frosty1995 94560   Dont You Hate It If People Talk Like This?
#15 - caps, sarcasm compliment
Quote
16:47:45 TechBlaze 28626   RAVEN FOR BEST SERVER HOSTING 10/10 RIGHT HERE

It really boils down to the intention. Is it possible to detect maliciousness?



TL;DR

If chat spam is kind of a gray area, how would a chat moderator bot work?

Comment on how you approach this, how it would handle some of the examples, examples of your own, ideas, clarifications, improvements, how lame this sounds, how way too big this seems, stories of times you had to make a tough call...

...or you could just start spamming and see where that gets you.

no automated moderator will be perfect; as long as you can catch same/patterened repetition and chatting too fast it will get most of the cases. look at blockbot and how simple that she was, yet effective 70-80% of the time.
as for gibberish, you could do a word/space key search (how long the word is, any presence of spaces, number of actual words in the long word if someone's typing without spaces) and just cull the extreme cases (which incidentally also covers extreme letter repetition)
« Last Edit: April 18, 2016, 07:58:13 PM by Conan »

"Look At All The Caps I'm Using! Is This Considered Spam?"

this is spam

"The FBI, the CIA, and the MIB went to the YMCA for KFC and TLC."

this is not

if strcontains("is this considered spam?")

i love how you just label #11 "sugar" lmao

The only reliable way to create a bot like this is to make it into an AI. Feed it a bunch of patterns considered spam and let it find those patterns in chat. This, obviously, is not realistically achievable.

If they just take up one line in all caps cause they're mad I don't really care but if it's multiple lines of crap or their messages aren't real sentences then it's a problem.

If they just take up one line in all caps cause they're mad I don't really care but if it's multiple lines of crap or their messages aren't real sentences then it's a problem.
This, I don't mind caps just as long as they don't press enter 20 times

i love how you just label #11 "sugar" lmao
Well, did you make the server?

imo capitals dont define spam, its just the repeating the same characters/words/sentences over and over

I consider chat spam a repeating message which purpose is to take up space in the chat window and consequently annoy players. Language, capitalization, and length are not defining factors. Those might be used for trolling, but chat spam should be limited to repetition and length. Whether it is the same character multiple times or a message, the only exceptions I can think of are corrections and people making sure the piece was heard.

The only reliable way to create a bot like this is to make it into an AI. Feed it a bunch of patterns considered spam and let it find those patterns in chat. This, obviously, is not realistically achievable.
especially if it turns self-aware



muting is probably the most straight-forward of punishments

no chance of getting kicked or banned for accident but you still are not able to spam in chat (for a while)

or you can make it escalate, three chat spam detections mean you get a kick

To be honest, I thought once of giving the idea of an automoderator that would dispose of obvious rule violations, but I was afraid of being called lazy and dropped the idea.

I consider chat spam a repeating message which purpose is to take up space in the chat window and consequently annoy players. Language, capitalization, and length are not defining factors. Those might be used for trolling, but chat spam should be limited to repetition and length. Whether it is the same character multiple times or a message, the only exceptions I can think of are corrections and people making sure the piece was heard.
This 100%

I consider chat spam a repeating message which purpose is to take up space in the chat window and consequently annoy players. Language, capitalization, and length are not defining factors. Those might be used for trolling, but chat spam should be limited to repetition and length. Whether it is the same character multiple times or a message, the only exceptions I can think of are corrections and people making sure the piece was heard.
What about this?
Quote
What the forget did you just loving say about me, you little bitch? I’ll have you know I graduated top of my class in the Navy Seals, and I’ve been involved in numerous secret raids on Al-Quaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I’m the top sniper in the entire US armed forces. You are nothing to me but just another target. I will wipe you the forget out with precision the likes of which has never been seen before on this Earth, mark my loving words. You think you can get away with saying that stuff to me over the Internet? Think again, forgeter. As we speak I am contacting my secret network of spies across the USA and your IP is being traced right now so you better prepare for the storm, maggot. The storm that wipes out the pathetic little thing you call your life. You’re loving dead, kid. I can be anywhere, anytime, and I can kill you in over seven hundred ways, and that’s just with my bare hands. Not only am I extensively trained in unarmed combat, but I have access to the entire arsenal of the United States Marine Corps and I will use it to its full extent to wipe your miserable ass off the face of the continent, you little stuff. If only you could have known what unholy retribution your little “clever” comment was about to bring down upon you, maybe you would have held your loving tongue. But you couldn’t, you didn’t, and now you’re paying the price, you goddamn idiot. I will stuff fury all over you and you will drown in it. You’re loving dead, kiddo.
Someone makes a script to post various versions of this over several lines of chat. It's purpose is to take up space in the chat window and annoy people. Theres hardly any detectable pattern. The only thing it is is long, but what if someone is just telling a long story?



especially if it turns self-aware

*bot learns to mute people by killing them*