When AI goes wrong…

Ok so as I’m writing this I’m currently on a temporary ban from twitter…. the reasons for this I believe are more complex than one would expect. Full disclosure, I’m assuming the ban was in part due to AI but that is an assumption… but it’s the only realistic explanation I have. You might be thinking, who cares…. but bear with me….

In the digital world when your account is blocked what impact does it have?

So the good news, I went cycling and have been having some jokes with friends… the world did not end. However, I did mid conversation get suddenly cut off from talking to lots of generally awesome people (ok there’s a few bad apples but really most people are cool) which kind of sucked.

At first I didn’t really understand why, I hadn’t seen the email and just saw my account get locked. I then noticed an email:

When I got to: ‘CVE-2006-3392 is a classic directory traversal vulnerability in Webmin’ I was really confused, this tweet is basically a CVE high level description. There’s absolutely nothing malicious or to my knowledge terms and conditions violating about it. But there was an image on that post…. I honestly could remember I hadn’t posted exploit code, so I went hunting through my screenshot history, to find the image:

This is a directory listing showing what I suspected is a threat actors Open Directory (but it could be researcher), I cropped the IP address because in this instance I hadn’t done any further enrichment. I am still frankly baffled as to how/why this breaks ToS etc. It’s literally a list of file names.

So I had started the appeal process but honestly I weighed up the balance and decided to just follow the content removal wizard. This left me with a 12 hour ban. (my yapping moved to signal rapidly).

Moderation that becomes censorship

So what’s the issue here?

This wasn’t malicious, it was pretty standard CTI twitter usage.
I’m a paying customer so I’m kind of miffed that I’m banned for posting a CVE description.
I’m a very loud proponent of not using cyber security skills for illegal purposes.
I suspect this was due to someone maliciously reporting this benign post.
I suspect an LLM was used to review the content and it linked a CVE with file names to come up with the answer ‘THIS MUST BE CRIME’ which it is not.

So the issues I see:

Potentially this makes sharing intel on twitter risky….
If as I suspect using an LLM for moderation, seems like a terrible idea.
Social media reporting and appeal processes seem to be quite poor.
If we remove the social media part and think about what else AI might be used for (e.g. policing, court work, legal work etc.) what risks are there?

Rules for something but not others

So I got banned probably because of said exploit and my image ‘looked like code’, however you don’t get banned for posting PoCs e.g. (check this awesome work out by Tom about the lack of Edge memory protections for key materials):

and then the stuff I won’t screenshot, the racist, homophobic/transphobic and other actually harmful and nasty things some people post….

Social media has loads of pros and cons. Sorting out the moderation is however really the nuts and bolts of an online community platform. Racism being ok but benign nerd cyber intel causing a ban is insane!

Touching Grass

This didn’t ruin my day, but it was annoying to be cut off from friends (ok the convo moved to signal but that isn’t the point). It shows that when we use systems like this for comms that a malicious report, a misidentified (provably by AI) bit of content can create negative impact.

So, how do we trust systems for comms when the platforms themselves have vulnerabilities in their business logic that someone (or something) can leverage to cause negative impact (perhaps it was just automation but it seems odd given the content I post on a regular basis)

How do we trust systems that fundamentally seem unreliable and open to abuse? How does this fit into a human digital hybrid world? How do we trust AI when we can see already have fragile things are?

So the world, if AI takes over is more likely to be less skynet and more a janky mess! Perhaps us mere humans still have some value!

2026: Why have we not solved DDoS Yet?

ai Automation CyberSecurity Hacking safety Social Media society

In the digital world when your account is blocked what impact does it have?

Moderation that becomes censorship

Rules for something but not others

Touching Grass

Related articles