You are what you include: Large-scale evaluation of Remote JavaScript inclusions

Today, I am back to Belgium, after spending one week in the US. I was in Raleigh, NC, to attend the 19th ACM conference on Computer and Communication Security and to present our paper titled You Are What You Include: Large-scale Evaluation of Remote JavaScript Inclusions. KU Leuven had an all time record this year, since we had 4 (!) full papers accepted.

In this post, I want to summarize the findings of our remote inclusions study so that you can get a glimpse of the size of the problem and hopefully get curious enough to read our 12-page paper 🙂 So, about a year ago we were concerned that web-site administrators are including JavaScript code from remote-sources without too much thinking. This can lead to issues because:

  1. The remotely included code can be buggy and you are thus introducing vulnerabilities to your own site, when you choose to include it
  2. The remote host can be malicious and use its scripts to attack your users and exfiltrate data from your site
  3. The remote host can be targeted by an attacker, as a way of reaching a harder to get target (e.g. your page)

This lead us to conduct the largest, to-date, web crawl with a focus on remote JavaScript inclusions. Based on the top 10,000 Alexa sites, we crawled over 3,300,000 pages and recorded approximately 8.5 million remote inclusions. The findings that I want to share with you, in this blog post are the following:

  1. Even though most sites of the Alexa top 10,000 include code from up to 15 remote hosts, there are sites that include code from up to 295 remote hosts. Assuming that only one of these hosts is enough to fully compromise your script-including site, trusting almost 300 of those is, at the very least, worrisome
  2. As far as remote inclusions as concerned, Google is king, owning 5 out of the top 10 most included scripts found in our study
  3. Certain web-tracking and market-research companies (like addthis.com and scorecardresearch.com) have crept their way into the 10 most popular remotely-included JavaScript scripts. This should raise some eyebrows, since these are sites that most users have never directly visited and are not aware of their existence.
  4. A large percentage of JavaScript providers seem to not be too interested in keeping their software and servers up-to-date. This can be problematic, when a motivated attacker targets them and if you include code from them, they are essentially the weakest link in your security chain.
  5. In our logs of remote JavaScript inclusions, we found the following, previously unknown, vulnerabilities:
    • Script inclusions from locahost: We found over 130 script inclusions which were requesting their “remote” JavaScript from localhost or 127.0.0.1. What this means, is that a user’s browser will try to fetch the necessary JavaScript code from the user’s own machine. Couple this, with multi-user systems, where users have the ability to run web-server processes (think smartphones) and you have a recipe for disaster. A malicious user can poison the vulnerable page by providing malicious JavaScript for all these requests. We called these Cross-User Scripting attacks
    • Script inclusions from private-network IP addresses: Same as above, but now the site tries to include code from hosts such as “192.168.1.1”. This means that the attacker now just needs to be in the same local network (Cross-network Scripting).
    • Script inclusions from non-registered domains: This is one of my favorites. We found 56 domains that were supposedly providers for JavaScript, but they were not registered!!! This means that an attacker can simply register a domain and start providing malicious JavaScript to anyone who requests it. We registered two of these and recorded about 85,000 requests for JavaScript in two weeks! We called these Stale Domain-name-based Inclusions since these are domains that were once full sites and even-though they expired, other sites kept on requesting scripts from them.
    • Script inclusions from non-responding IP addresses: Addressing a remote host directly by its IP address means that if that host gets assigned a new IP address, you should update your records. We found some interesting cases of sites requesting JavaScript from IP addresses that did not even have a web server listening on the appropriate port. While harder to exploit, an attacker who gets hold of a particular IP address will be able to inject code on the victim pages (Stale IP-address-based Inclusions)
    • Script inclusions from typosquatting domains: That’s probably my favorite one 🙂 We found some unregistered domains that were actually mistypes of the intended domain (e.g. googlesyndicatio.com, missing the final n). We realized that the developers messed-up when writing the script inclusion and requested code from the wrong domain. By registering googlesyndicatio.com, we actually recorded more than 163,000 requests for JavaScript in two weeks! This goes to show that developers, like all others, are also prone to misspell domains. These misspellings however have the potential to cause much greater damage than a user’s mistypes in her own browser. We called this attack Typosquatting Cross-site Scripting(TXSS).

In total, we found that there are many ways for an application to be attacked based on a remote JavaScript inclusions. In addition to all the findings mentioned above, we also studied script-inclusions over time and we evaluated the practicality of two straightforward countermeasures, i.e., coarse-grained sandboxing and local script copies, and showed how feasible (or not) is each one, given the current popular scripts of the web.

Check out our full paper for all the juicy details 🙂
Till next time

Nick Nikiforakis

This entry was posted in Paper summaries. Bookmark the permalink.

2 Responses to You are what you include: Large-scale evaluation of Remote JavaScript inclusions

Leave a Reply

Your email address will not be published. Required fields are marked *