crossbrowsertracking NDSS17

2020-03-01 184浏览

1.(Cross-)Browser Fingerprinting via OS and Hardware Level Features Yinzhi Cao Song Li Erik Wijmans† Lehigh University yinzhi.cao@lehigh.edu Lehigh University sol315@lehigh.edu Washington University in St. Louis erikwijmans@wustl.edu Abstract—In this paper, we propose a browser fingerprinting technique that can track users not only within a single browser but also across different browsers on the same machine. Specifically, our approach utilizes many novel OS and hardware level features, such as those from graphics cards, CPU, and installed writing scripts. We extract these features by asking browsers to perform tasks that rely on corresponding OS and hardware functionalities. Our evaluation shows that our approach can successfully identify 99.24% of users as opposed to 90.84% for state of the art on single-browser fingerprinting against the same dataset. Further, our approach can achieve higher uniqueness rate than the only cross-browser approach in the literature with similar stability. I. I NTRODUCTION Web tracking is a debatable technique used to remember and recognize past website visitors. On the one hand, web tracking can authenticate users—and particularly a combination of different web tracking techniques can be used for multifactor authentication to strengthen security. On the other hand, web tracking can also be used to deliver personalized service— if the service is undesirable, e.g., some unwanted, targeted ads, such tracking is a violation of privacy. No matter whether we like web tracking or whether it is used legitimately in the current web, more than 90% of Alexa Top 500 websites [39] adopt web tracking, and it has drawn much attention from general public and media [6]. Web tracking has been evolving quickly. The firstgeneration tracking technique adopts stateful, server-set identifiers, such as cookies and evercookie [21]. After that, the second-generation tracking technique called fingerprinting emerges, moving from stateful identifiers to stateless— i.e., instead of setting a new identifier, the second-generation technique explores stateless identifiers like plug-in versions and user agent that already exist in browsers. The secondgeneration technique is often used together with the first to † The author contributed to the paper when he was a REU student at Lehigh University. Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author’s employer if the paper was prepared within the scope of employment. NDSS ’17, 26 February - 1 March 2017, San Diego, CA, USA Copyright 2017 Internet Society, ISBN 1-1891562-46-0http://dx.doi.org/10.14722/ndss.2017.23152restore lost cookies. Both first and second generation tracking are constrained in a single browser, and nowadays people are developing third-generation tracking technique that tries to achieve cross-device tracking [16]. The focus of the paper is a 2.5-generation technique in between the second and the third, which can fingerprint a user not only in the same browser but also across different browsers on the same machine. The practice of using multiple browsers is common and promoted by US-CERT [42] and other technical people [12]: According to our survey,1 70% of studied users have installed and regularly used at least two browsers on the same computer. The proposed 2.5-generation technique, from the positive side, can be used as part of stronger multi-factor user authentications even across browsers. From another angle, just as many existing research works on new cyber attacks, the proposed 2.5-generation tracking can also help to improve existing privacy-preserving works, and we will briefly discuss the defense of our cross-browser tracking in Section VII. Now, let us put aside the good, the bad and the ugly usages of web tracking, and look at the technique itself. To fingerprint different browsers installed on the same machine, one simple approach is to use existing features that fingerprint single browser. Because many existing features are browser specific, the cross-browser stable ones are not unique enough even when combined together for fingerprinting. That is why the only cross-browser fingerprinting work, Boda et al. [14], adopts IP address as a main feature. However, IP address, as a network-level feature, is excluded from modern browser fingerprinting in the famous Panopticlick test [5] and many other related works [10, 20, 26, 32, 34, 36]. The reason is that IP address changes if allocated dynamically, connected via mobile network, or a laptop switches locations such as from home to office—and is unavailable behind an anonymous network or a proxy. In the paper, we propose a (cross-)browser fingerprinting based on many novel OS and hardware level features, e.g., these from graphics card, CPU, audio stack, and installed writing scripts. Specifically, because many of such OS and hardware level functions are exposed to JavaScript via browser APIs, we can extract features when asking the browser to perform certain tasks through these APIs. The extracted features can be used for both single- and cross-browser fingerprinting. 1 More details about our experiment can be found in Appendix A.
2.Let us take WebGL, a 3D component implemented in browser canvas object, for example. While canvas, especially the 2D part, has been used in single-browser fingerprinting [9, 32], WebGL is actually considered as “too brittle and unreliable” even for a single browser by a very recent study called AmIUnique [26]. The reason for such conclusion is that AmIUnique selects a random WebGL task and does not restrict many variables, such as canvas size and anti-aliasing, which affect the fingerprinting results. then introduce some features that need modification especially for cross-browser fingerprinting. Next, we present our newlyproposed features. Although there are no restrictions for features on singlebrowser fingerprinting, our cross-browser features need to reflect the information and operation of the level below the browser, i.e., the OS and hardware level. For example, both vertex and fragment shaders expose the behaviors of GPU and its driver in the OS; the number of virtual cores is a CPU feature; the installed writing scripts are OS-level features. The reason is that these features in the OS and hardware level are relative more stable acrossbrowsers:all browsers are running on top of the same OS and hardware. Contrasting with this conclusion drawn by AmIUnique, we show that WebGL can be used not only for single- but also for cross-browser fingerprinting. Specifically, we ask the browser to render more than 20 tasks with carefully selected computer graphics parameters, such as texture, anti-aliasing, light, and transparency, and then extract features from the outputs of these rendering tasks. Note that if an operation, especially the outputs of the operation, is contributed by both the browser and the underlying (OS and hardware) levels, we can use it for single-browser fingerprinting, but need to get rid of the browser factor in cross-browser fingerprinting. For example, when we render an image as a texture on a cube, the texture mapping is an GPU operation but the image decoding is a browser one. Therefore, we can only use PNG, a lossless format, for cross-browser fingerprinting. For another example, the dynamic compression operation of audio signals is performed by both the browser and the underlying audio stack, and we need to extract the underlying features. Now let us introduce these features used in the paper. Our principal contribution is being the first to use many novel OS and hardware features, especially computer graphics ones, in both single- and cross-browser fingerprinting. Particularly, our approach with new features can successfully fingerprint 99.24% of users as opposed to 90.84% for AmIUnique, i.e., state of the art, on the same dataset for single-browser fingerprinting. Moreover, our approach can achieve 83.24% uniqueness with 91.44% cross-browser stability, while Boda et al. [14] excluding IP address only have 68.98% uniqueness with 84.64% cross-browser stability. Our secondary contribution is that we make several interesting observations for single- and cross-browser fingerprinting. For example, we find that the current measurement of screen resolution, e.g., the one done in AmIUnique, Panopticlick [5, 17] and Boda et al. [14], is unstable, because the resolution changes in Firefox and IE when the user zooms in or out the web page. Therefore, we take the zoom level into consideration, and normalize the width and height in screen resolution. For another example, we find that both DataURL and JPEG formats are unstable across different browsers, because these formats are with loss and implemented differently in multiple browsers and the server side as well. Therefore, we need to adopt lossless formats for server-client communications in cross-browser fingerprinting. A. Prior Fingerprintable Features In this part of the section, we introduce fingerprintable features that we adopted from state of the art. There are 17 features presented in the Table I of the AmIUnique paper [26], and we have all of them for our single-browser fingerprinting. More detailed can be found in their paper. Because many of such features are browser specific, we adopt a subset with 4 features for cross-browser fingerprinting, namely screen resolution, color depth, list of fonts, and platform. Some of these features need modifications and are introduced below. B. Old Features with Major Modifications One prior feature, screen resolution, needs refactoring for both single- and cross-browser fingerprinting. Then, we introduce another fingerprintable feature, the number of CPU virtual cores. Lastly, two prior features need major modifications for cross-browser fingerprinting. Our work is open-source and available athttps://github.com/Song-Li/cross browser/, and a working demo is athttp://www.uniquemachine.org. The rest of the paper is organized as follows. We first present all the features including old ones adopted and modified from AmIUnique and new ones proposed by us in Section II. Then, we introduce the design of our browser fingerprinting including the overall architecture, rendering tasks, and mask generation in Section III. After that, we talk about our implementation in Section IV, and data collection in Section V. We evaluate our approach and present the results in Section VI. Next, we discuss the defense of our fingerprinting in Section VII, some ethics issues in Section VIII, and related work in Section IX. Our paper concludes in Section X. II. Screen Resolution. The current measurement of screen resolution is via the “screen” object under JavaScript. However, we find that many browsers, especially Firefox and IE, change the resolution value in proportion to the zoom level. For example, if the user enlarges the webpage with “ctrl++” in Firefox and IE, the screen resolution is inaccurate. We believe that the zoom level needs to be considered in both single- and crossbrowser fingerprinting. Specifically, we pursue two separate directions. First, we adopt existing work [13] on the detection of zoom levels based on the size of a div tag and the device pixel ratio, and then adjust the screen resolution correspondingly. Second, because the former method is not always reliable as acknowledged by the inventors, we adopt a new feature, i.e., the ratio between F INGERPRINTABLE F EATURES In this section, we introduce fingerprintable features used in this paper. We start from features used in prior works, and 2
3.obtain the font list. Instead, we adopt the side-channel method mentioned by Nikiforakis et al. [36], where the width and height of a certain string is measured to determine the font type. Note that not all fonts are cross-browser fingerprintable because some fonts are web specific and provided by browsers, and we need to apply a mask shown in Section III-C to select a subset. Another thing worth noting is that we are aware that Fifield et al. [20] provide a subset of 43 fonts for fingerprinting, however their work is based on single-browser fingerprinting and not applicable in our cross-browser scenario. screen width and height, which does not change with the zoom level. In addition to screen resolution, we also find that some other properties, such as availHeight, availWidth, availLeft, availTop, and screenOrientation, are useful in both singleand cross-browser fingerprinting. The first four represents the available screens for the browser excluding system areas, such as the top menu and the tool bar of a Mac OS. The last one shows the position of the screen, e.g., whether the screen is landscape or portrait, and whether the screen is upside down. Number of CPU Virtual Cores. The core number can be obtained by a new browser feature called hardwareConcurrency, which provides the capability information for Web Workers. Now, many browsers support such feature, but some, especially early versions of browsers, do not. If not supported, there exsits a side channel [1] to obtain the number. Specifically, one can monitor the finishing time of payload when increasing the number of web workers. When the finishing time increases significantly at a certain level of web workers, the limit of hardware concurrency is reached, making it useful to fingerprint the number of cores. Note that, some browsers, such as Safari, will cut the number available cores to Web Workers by half, and we need to double the number for cross-browser fingerprinting. C. Newly-proposed Atomic Fingerprintable Features In this and next subsection, we introduce our newlyproposed fingerprintable features. We first start with atomic features, and by atomic, we mean that the browser exposes either an API or a component directly to the JavaScript. Then, we will introduce composite features, which usually requires more than one API and component to collaborate. Line, curve, and anti-aliasing. Line and curve are 2D features supported by both Canvas (2D part) and WebGL. Anti-aliasing is a computer graphics technique used to diminish aliasing by smoothing jaggies, i.e., jagged or stair-stepped lines, in either single line/curve object or the edge of a computer graphics model. There are many existing algorithms [4] for anti-aliasing, such as first-principles approach, signal processing approach, and mipmapping, which make anti-aliasing fingerprintable. The number of cores is known by the inventor to be fingerprintable [2] and this is one of the reasons that they call it hardwareConcurrency rather than cores. However, the feature is never being used or measured in prior arts of browser fingerprinting. Vertex shader. A vertex shader, rendered by GPU and the driver, converts each vertex in a 3D model to its coordinate in a 2D clip-space. In WebGL, a vertex shader may accept data in 3ways:'>ways: