Building @BugHuntBot, an XSS Payload Twitterbot Inspired by Peter Kim

If XSS sounds like the racier cousin of CSS, check out this previous entry, which will walk you through the basics.

If you're into web security literature, you might've come across Peter Kim's great hacking walkthrough, The Hacker's Playbook: A Practical Guide to Penetration Testing.

In the book, Peter (Mr. Kim) discusses a short script he uses to scrape XSS payloads from r/xss, a subreddit devoted to publishing and discussing XSS vulnerabilities. Here's the script, for reference.

#!/usr/bin/env python
#Reddit XSS
#Author: Cheetz
import urllib2, sys  
import logging, os, re, sys, urllib, string  
from optparse import OptionParser  
from urlparse import urlparse

class Lookup:  
        def run(self,url):
                request = urllib2.Request(url)
                request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)')
                response = urllib2.urlopen(request)
                resolve_response =
                #print resolve_response
        def regex(self,resolve_response):
                file = open("output_xss.txt", 'a')
                n = re.compile(r'href=\"http.*?>', re.IGNORECASE)
                result = n.findall(resolve_response)
                for a in result:

                        if ("reddit" not in a):
                                remove_string = 'href="'
                                a = a.replace(remove_string,"")
                                b = a.split('"')
                                a = b[0]

                p = re.compile(r'count=(\d+)&after=(.*?)\"', re.IGNORECASE)
                link = p.findall(resolve_response)
                next_string = ""+link[0][0]+"&after="+link[0][1]

if __name__ == '__main__':  
        url = ""
        app = Lookup()

And here's a snippet of what the script outputs if you run it — a pretty substantial list of XSS payloads downloaded to a file named output_xss.txt.;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//\%27;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//%22;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//\%22;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//--%3E%3C/SCRIPT%3E%22%3E%27%3E%3CSCRIPT%3Ealert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29%3C/SCRIPT%3E  

It's a neat tool (though one that, as you can see, produces some noise). If you're scratching you're head, wondering why a bunch of XSS payloads might be useful, there a couple reasons.

  1. XSS payloads can be incorporated into open-source scanners, increasing their ability to effectively mimic malicious attacks.
  2. A general understanding of common XSS filter bypasses can inform better web application security architecture and data sanitation processes (e.g. OWASP's XSS Filter Evasion Cheat Sheet)

But both of those really boil down to the same simple truth: When security researchers share techniques, tools, and payloads, they can better perform the difficult tests they need to in order to ensure critical web applications are secure and the web as a whole prospers. Open source security is one of the most promising ways forward for keeping the Internet an open, secure, and distributed system.

So with all of that in mind, I've decided on a fun project that'll mix my twin passions for web development and infosec — spreading the XSS gospel through the evangelism of twitterbots!

Building the Bot

There are a staggering array of tools to build bots for twitter, in a wide variety of languages and libraries, but as an unapologetic Javascripter and thoroughly lapsed Rubyist, I've settled on Nodejs, which thanks to its vibrant community, offers a great selection of API wrappers to choose from.

Since the Twitter API is very straightforward and will represent one of the easier parts of this application, let's start with something theoretically less direct — pulling the contents of Mr. Kim's scraping.

There are several ways to go about doing this. We could translate Big Daddy K's Python script into a Javascript framework tailor-made for scraping websites, like (Phantomjs or its extension Casperjs). Phantom and Casper aren't actually node modules per se, but rather parallel systems that also happen to be installed via npm. Spookyjs, is required to drive a Phantomjs script from within Node, but that means tracking data and logic through three separate Javascript scopes, along with a few weird workarounds and patch jobs.

I'm a good programmer, by which I mean I'm an incredibly lazy human being. I tap into this otherwise-not-very-great quality to benefit software engineering and society as a whole. In this instance, that laziness is telling me that there's a simpler solution than a three-headed hydra of a JS application. In fact, it's staring me straight in the face — incorporating K Diddy's script itself!

It turns out that node has an incredibly simple solution for running python scripts. It's called python-shell. After installing it via npm install --save python-shell and putting our script under a python directory, where the package expects to find it, it only takes a quick glance through the documentation to discover the snippet to do what we want.

var PythonShell = require('python-shell');

// run a simple script'', function (err, results) {  
  // script finished

There's a lot of support in python-shell for doing things like fiddling with the script's input and output via stdin and stdout, changing how the data is encoded and transmitted, etc, but again, I'm so very tired. Is there any solution for a hardworkin' bit-jockey like myself — one that doesn't require adding a persistent database layer or anything that will add an extra helping of complexity?

Actually, we already have a sort of database — the output_xss.txt file the script creates and writes to every time it runs. If we can schedule when it runs, making sure the file is there before the Twitterbot attempts to draw from it, then we can just read and write directly from it. That sounds much more doable.

We already have the code to download the contents for and create the output_xss.txt file, courtesy of Special K, now we just need to read the file, tweet the contents, and schedule the whole mess.

Reading the file is simple with node's built in fs module. Looking through the node API documentation, we see this code will open our file and log its contents:

fs.readFile('output_xss.txt', 'utf8', function (err, data) {  
  if (err) throw err;

In our case, we don't want to log the data, we want to tweet it — in pieces. But baby steps!

First, let's split up the content based on the \n character, so we can add deal with the payloads line-by-line:

xssPayloads = data.split("\n");

Then we want to make sure we strip out all the undefined values, just in case the split() picks up some blank newlines:

var goodPayloads = []; 

for (var index in xssPayloads) {  
  if( xssPayloads[index] ) { 

xssPayloads = goodPayloads;  

Now that we've got all good array values, we can display the last item with the following code:

var payloadIndex = xssPayloads.length -1;  

Pretty soon we'll start to add the code for actually tweeting this snippet, but for now, we'll just move on to deleting it, in order to make sure we don't push the same information twice. Luckily, it couldn't be easier. We'll just delete the last item in the array, and rewrite the output_xss.txt file.

The first part is crazy easy. Removing the last item in an array looks like this:


For rewriting the modified array to output_xss.txt, we'll have to use fs module once again. Luckily, it's super simple. First we need to join all the xssPayloads array objects into a single string object we can write into the file. Still within your fs.readFile() callback, enter:

joinedPayloads = "";

for (index in xssPayloads) {  
  joinedPayloads +='\n' + xssPayloads[index];

fs.writeFile('output_xss.txt', joinedPayloads, function (err) {  
  if (err) throw err;

Alright, only two more things left: We need to add the ability to tweet and the ability to schedule both tweeting and pulling scraping data!

Tweeting first. Node has several fantastic wrappers for the Twitter API, both RESTful and streaming. In this post we're going to use something I've had success with before and supports both side of the API, twit.

After installing twit with npm install --save twit, we're going to create a file called tweet.js and config.js. This is where we're going to put our actual twitter logic and secret keys/tokens respectively. config.js will look like this:

module.exports = {  
    consumer_key:         '...'
  , consumer_secret:      '...'
  , access_token:         '...'
  , access_token_secret:  '...'

Remember if you're using git to add config.js to your .gitignore file. You definitely do not want to post your credentials anywhere public!

Adapting the API documentation to our use case of posting a tweet, here's the final picture of what the tweet.js file looks like:

module.exports = {  
    postTweet: function (content) {
        var Twit = require('twit');
        var twitterInfo = require('./config.js');
        var T = new Twit(twitterInfo);'statuses/update', { status: content }, function(err, data, response) {

Commenting out everything but the code loading the tweet.js file and Twitter.postTweet("Hello World"); we test it out and find it works. Hello @BugHuntBot!

Now it's just a hop, skip and a jump to scheduling it and we're off the races! Since we're hacking this quick and dirty (to start, at least) the important thing for us is that the bot doesn't run out of links to tweet. With that in mind, let's shoot for the bot to post every hour and pull data once a week. Good thing for us node (as ever) has the perfect module for that, node-schedule.

node-schedule uses a very simple syntax for setting up recurring jobs. It's as simple as defining a rule object and passing that object to your scheduleJob() function.

node-schedule also supports setting recurrence rules via object literals. This snippet will pull our data every Sunday at 10:30.

var j = schedule.scheduleJob({hour: 22, minute: 30, dayOfWeek: 0}, function(){'', function (err, results) {

Once we write the code scheduling the tweet.js tweet, which also includes the fs code to pull and rewrite Mr. K's script, we're finished! There are some extra features we could (and will) add later, but this is a perfectly respectful MVP for one evening's work.

You can find the final version of the index.js, tweet.js and the rest on Github.

Running the Bot

With both our scheduled jobs set up (for pulling data and posting tweets), the only thing left to do is run the script. The best, fault-tolerant way of setting up a node script to run for a long time is forever, a node package easily installed in the usual way (though it's important to run it globally):

sudo npm install -g forever

Then all it takes is... forever start index.js and we're in business!

Parting Thoughts.

In his book, Peter Kim warns users not to click on these links, because that could be construed as attacks on the site. I have to make the same statement here. However, I still think there is value in sharing (in a common medium) information that is vital to freelance security researchers. Users on the /r/xss subreddit are told explicitly in the forum and its rules to not post anything illegal or malicious. It is my hope that their restraint will preserve the utility of BugHuntBot as a research tool for freelance penetration testers interested in collecting and sharing code.

If you have questions or corrections, feel free to reach out to me at [email protected] and of course, if you'd like a more general introduction to pentesting, check out my book.

Thanks for reading and, as ever, happy hunting!