You are what you include: Large-scale evaluation of Remote JavaScript inclusions

Today, I am back to Belgium, after spending one week in the US. I was in Raleigh, NC, to attend the 19th ACM conference on Computer and Communication Security and to present our paper titled You Are What You Include: Large-scale Evaluation of Remote JavaScript Inclusions. KU Leuven had an all time record this year, since we had 4 (!) full papers accepted.

In this post, I want to summarize the findings of our remote inclusions study so that you can get a glimpse of the size of the problem and hopefully get curious enough to read our 12-page paper :) So, about a year ago we were concerned that web-site administrators are including JavaScript code from remote-sources without too much thinking. This can lead to issues because:

  1. The remotely included code can be buggy and you are thus introducing vulnerabilities to your own site, when you choose to include it
  2. The remote host can be malicious and use its scripts to attack your users and exfiltrate data from your site
  3. The remote host can be targeted by an attacker, as a way of reaching a harder to get target (e.g. your page)

This lead us to conduct the largest, to-date, web crawl with a focus on remote JavaScript inclusions. Based on the top 10,000 Alexa sites, we crawled over 3,300,000 pages and recorded approximately 8.5 million remote inclusions. The findings that I want to share with you, in this blog post are the following:

  1. Even though most sites of the Alexa top 10,000 include code from up to 15 remote hosts, there are sites that include code from up to 295 remote hosts. Assuming that only one of these hosts is enough to fully compromise your script-including site, trusting almost 300 of those is, at the very least, worrisome
  2. As far as remote inclusions as concerned, Google is king, owning 5 out of the top 10 most included scripts found in our study
  3. Certain web-tracking and market-research companies (like addthis.com and scorecardresearch.com) have crept their way into the 10 most popular remotely-included JavaScript scripts. This should raise some eyebrows, since these are sites that most users have never directly visited and are not aware of their existence.
  4. A large percentage of JavaScript providers seem to not be too interested in keeping their software and servers up-to-date. This can be problematic, when a motivated attacker targets them and if you include code from them, they are essentially the weakest link in your security chain.
  5. In our logs of remote JavaScript inclusions, we found the following, previously unknown, vulnerabilities:
    • Script inclusions from locahost: We found over 130 script inclusions which were requesting their “remote” JavaScript from localhost or 127.0.0.1. What this means, is that a user’s browser will try to fetch the necessary JavaScript code from the user’s own machine. Couple this, with multi-user systems, where users have the ability to run web-server processes (think smartphones) and you have a recipe for disaster. A malicious user can poison the vulnerable page by providing malicious JavaScript for all these requests. We called these Cross-User Scripting attacks
    • Script inclusions from private-network IP addresses: Same as above, but now the site tries to include code from hosts such as “192.168.1.1″. This means that the attacker now just needs to be in the same local network (Cross-network Scripting).
    • Script inclusions from non-registered domains: This is one of my favorites. We found 56 domains that were supposedly providers for JavaScript, but they were not registered!!! This means that an attacker can simply register a domain and start providing malicious JavaScript to anyone who requests it. We registered two of these and recorded about 85,000 requests for JavaScript in two weeks! We called these Stale Domain-name-based Inclusions since these are domains that were once full sites and even-though they expired, other sites kept on requesting scripts from them.
    • Script inclusions from non-responding IP addresses: Addressing a remote host directly by its IP address means that if that host gets assigned a new IP address, you should update your records. We found some interesting cases of sites requesting JavaScript from IP addresses that did not even have a web server listening on the appropriate port. While harder to exploit, an attacker who gets hold of a particular IP address will be able to inject code on the victim pages (Stale IP-address-based Inclusions)
    • Script inclusions from typosquatting domains: That’s probably my favorite one :) We found some unregistered domains that were actually mistypes of the intended domain (e.g. googlesyndicatio.com, missing the final n). We realized that the developers messed-up when writing the script inclusion and requested code from the wrong domain. By registering googlesyndicatio.com, we actually recorded more than 163,000 requests for JavaScript in two weeks! This goes to show that developers, like all others, are also prone to misspell domains. These misspellings however have the potential to cause much greater damage than a user’s mistypes in her own browser. We called this attack Typosquatting Cross-site Scripting(TXSS).

In total, we found that there are many ways for an application to be attacked based on a remote JavaScript inclusions. In addition to all the findings mentioned above, we also studied script-inclusions over time and we evaluated the practicality of two straightforward countermeasures, i.e., coarse-grained sandboxing and local script copies, and showed how feasible (or not) is each one, given the current popular scripts of the web.

Check out our full paper for all the juicy details :)
Till next time

Nick Nikiforakis

Posted in Uncategorized | Leave a comment

Breaking McAfee’s Social Protection

On my usual daily visit of Slashdot, I read that McAfee introduced a new application called “McAfee Social Protection” for Facebook. In a nutshell, you install their plugin, allow their application to control quite a bit of your Facebook and then you can start uploading pictures “safely”. Here’s a video of it in action.

Continue reading

Posted in Uncategorized | 8 Comments

Google AdChoices…

They say a picture is worth a thousand words. How about, two pictures?

AdChoices "pledge"

So, the important points of the above text are:

“It’s our goal to make these ads as relevant and useful as possible for you. Google doesn’t create categories, or show ads, based on sensitive topics such as race, religion, sexual orientation, or health.

Sounds reasonable. Let’s see the ad that actually got me here.

An AdSense ad about dating, based on religion, while watching a video-clip from a popular Christian band

One can claim that the ad is targeting single people, but it is actually targeting the intersection of Christian and Single, thus effectively targeting both.

Google, don’t be evil.

 

Posted in Uncategorized | 2 Comments

To Google Chrome: Relax less…

I’ve been recently reading Michal Zalewski’s “The Tangled Web”, a book which tries to map the whole security landscape around browsers and Web applications in about 300 pages… it does a pretty good job :)

Now, in Chapter 9, he talks about “Content Isolation Logic” and in one specific section he touches on the document.domain property of the DOM of a page. So, in short, when two pages, foo.example.com and bar.example.com want to communicate through JavaScript, by default they cannot since the Same Origin Policy allows accesses only when the protocol, domain and port fully match. Since, “foo.example.com” !== “bar.example.com” the check fails and thus the domains can’t communicate. Since this is somewhat too rigid, a developer can choose to relax the domain of his page from foo.example.com to example.com. In JavaScript, this is a simple assignment to the document.domain property: Continue reading

Posted in Uncategorized | 1 Comment

El cheapo hosting, le open redirect…

Did you know that if you use a popular cheap web hosting product and you haven’t changed the default error pages of your sites, you are most likely hosting an open redirect? If not, read on :)

Suppose for a second that you are a loyal customer of babyhow.com, an eshop selling stuff for your baby. One beautiful day you receive an email letting you know that babyhow.com has moved its business and providing you with a link to read more:

Continue reading

Posted in Uncategorized | Leave a comment

Bluehost.com made me feel blue…

Two years ago I decided to get a personal site. I was after two things: flexibility and low cost. I didn’t want to get a VPS but I also didn’t want the hosting packages of one domain and 350MB of space. So I found IPage.com a shared hosting provider that was giving me unlimited hosted sites, unlimited databases, unlimited bandwidth and unlimited disk space for about 35 euros for a year… that in my book was a great deal! So I went ahead and bought it. In that year I was generally happy with them. My pages where occasionally a bit slow but still fast enough for my sites’ needs. The problem was that the 35 euro price was an introductory price and the next year IPage asked me for triple that amount… Since I didn’t feel like that was a good thing (now that you are a customer we’ll suck you dry) I decided to look elsewhere. A colleague at work recommended Bluehost.com. Bluehost offered me the same things as IPage plus SSH access for about 50 euros per year. I went for that and I was quite happy…. until this week… Continue reading

Posted in Uncategorized | 14 Comments

Stored XSS on Statcounter!!!

Stored XSS on popular Web statistics framework Statcounter. Log yourselves out of Statcounter and if possible disable JavaScript for the domain (possible in Chrome, not sure about Firefox)…  Will give more details when Statcounter fixes it. The only reason I am saying it here is because my Statcounter logs just started popping alert boxes!

Posted in Uncategorized | Leave a comment

What do you call?

Joke I just made up:

What do you call a woman who first says to you “I love you” but ten minutes later she adds “I actually don’t, but don’t feel bad because I say that to all men”?

Continue reading

Posted in Uncategorized | Leave a comment

Firefox and Self-XSS

I still remember the good old days when I would just write “javascript:alert(document.cookie)” in my address bar and the browser would happily show me the JavaScript-accessible cookie values for the current domain. These were simpler days…

Mid-2011 the developers of Firefox decided that allowing the “javascript” directive in the URL bar was being abused by attackers to conduct self-XSS attacks more than it was being used for legitimate purposes. If you are not familiar with self-XSS fear not… they are quite easy to explain. Continue reading

Posted in Uncategorized | 4 Comments

If he was good enough…

Standford's Course on AI

Since the beginning of October I’ve been following the online AI course from Standford, taught by Sebastian Thrun and Peter Norvig. In the last two months, I’ve given up a great part of my free time to look at videos, do quizzes, read clarifications on the AI page on Reddit and complete assignments. I will not say it was not worth it. It definitely was. I’ve
learned so much and I already have ideas on how to use Artificial Intelligence (specifically Machine Learning) in my own field (Computer Security).

Last night, I noticed a link on the course website that lead me to a YouTube video
of the latest Google+ Hangout where two AI professors, along with Sal Khan, the founder
of Khan Academy and a handful of students from some universities in the US where talking about the future of education and how these new ways of teaching are “reinventing education”.

I was listening to their  discussion when the following comment by Prof. Thrun really jumped out of the page and hit me on the head…

Continue reading

Posted in Uncategorized | 5 Comments