Sunday, May 19, 2019

User Authentication Through Mouse Dynamics

16 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 1, JANUARY 2013 User Au thustication Through fawn Dynamics Chao Shen, savant Member, IEEE, Zhongmin Cai, Member, IEEE, Xiaohong Guan, Fellow, IEEE, Youtian Du, Member, IEEE, and Roy A. Maxion, Fellow, IEEE AbstractBehavior- ground drug rehearser certificate with pointing devices, overmuch(prenominal) as mice or touchpads, has been gaining attention. As an emerging behavioural biometric, cower kinetics aims to address the trademark problem by verifying computer drug substance absubstance absubstance abusers on the basis of their fawn operating(a) styles.This paper presents a simple and ef? cient user stylemark draw near establish on a ? xed reversal-ope dimensionn job. For all in all(prenominal) sample of the lift-operation caper, both traditional holistic births and pertly de? ned procedural singularitys be extracted for accu charge per unit and ? ne-grained impersonation of a users cu rious reversal behavior. blank space-measure and eigenspace-transformation techniques atomic human action 18 applied to cause receive comp angiotensin- mulctverting enzyments for ef? ciently representing the veritable reversal cause space.Then a 1-class learning algorithm is employed in the remoteness-based feature eigenspace for the enfranchisement line of melt. The go about is assessd on a info develop of 5550 lift-operation samples from 37 studycases. Extensive look intoal results argon included to demonst lay the ef? cacy of the proposed approach, which achieves a false-acceptance rate of 8. 74%, and a false-rejection rate of 7. 69% with a corresponding earmark clock time of 11. 8 seconds. devil additional trys are permitd to compare the online approach with any(prenominal) other approaches in the literature.Our selective teachingset is publicly available to still future investigate. Index TermsBiometric, snarf kinetics, bona fideation, eige nspace transformation, unitary-class learning. I. INTRODUCTION T HE quest for a time- evidenceed and convenient guarantor mechanism to authenticate a computer user has existed since the inadequacy of conventional discussion mechanism was realized, ? rst by the security community, and then gradu all toldy by the Manuscript received March 28, 2012 revise July 16, 2012 accepted September 06, 2012. Date of publication October 09, 2012 date of current version December 26, 2012.This do work was back up in rive by the NSFC (61175039, 61103240, 60921003, 60905018), in fictional character by the National intuition Fund for secern Young Scholars (60825202), in part by 863 High Tech Development Plan (2007AA01Z464), in part by the Research Fund for Doctoral Program of Higher Education of China (20090201120032), and in part by Fundamental Research Funds for Central Universities (2012jdhz08). The work of R. A. Maxion was supported by the National Science Foundation under Grant CNS-07166 77. Any opinions, ? dings, conclusions, or recommendations expressed in this material are those of the authors, and do non necessarily re? ect the views of the National Science Foundation. The associate editor coordinating the appraise of this manuscript and approving it for publication was Dr. Sviatoslav Voloshynovskiy. C. Shen, Z. Cai, X. Guan, and Y. Du are with the MOE Key Laboratory for Intelligent electronic cyberspaces and Network Security, Xian Jiaotong University, Xian, Shaanxi, 710049, China (e-mail emailprotected xjtu. edu. cn emailprotected xjtu. edn. cn emailprotected xjtu. edu. cn emailprotected jtu. edu. cn). R. A. Maxion is with the Dependable Systems Laboratory, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail emailprotected cmu. edu). Color versions of one or much of the ? gures in this paper are available online at http//ieeexplore. ieee. org. Digital Object Identi? er 10. 1109/TIFS. 2012. 2223677 public 31. As data ar e moved from traditional localized computing purlieus to the new Cloud Computing picture (e. g. , Box. net and Dropbox), the need for better authentication has become more than pressing.Recently, several large-scale password leakages heart-to-heart users to an unprecedented risk of disclosure and abuse of their information 47, 48. These incidents seriously shook public con? dence in the security of the current information infrastructure the inadequacy of password-based authentication mechanisms is becoming a breeding clientele for the correct information society. Of unlike emf solutions to this problem, a particularly promising technique is slip kinetics. reversal dynamics measures and assesses a users lift-behavior characteristics for use as a biometric.Compared with other biometrics such(prenominal) as side, ? ngerprint and voice 20, computer computer blow dynamics is less(prenominal) intrusive, and requires no specialized sonorousware to capture biometric in formation. on that pointfore it is suitable for the current Internet environment. When a user tries to log into a computer system, mouse dynamics only requires her to put forward the login find out and to run a certain sequence of mouse trading trading operations. Extracted behavioural features, based on mouse courses and clicks, are compared to a decriminalise users pro? le. A friction match authenticates the user otherwise her access is denied.Furthermore, a users mouse-behavior characteristics hind end be continually analyse during her subsequent usage of a computer system for identity supervise or intrusion detection. Yampolskiy et al. provide a review of the ? eld 45. Mouse dynamics has attracted more and more enquiry interest over the last decade 24, 8, 1417, 19, 21, 22, 33, 34, 3941, 45, 46. Although old research has sh probeify promising results, mouse dynamics is still a newly emerging technique, and has non reached an acceptable level of feat (e. . , Euro pean tired for commercial biometric technology, which requires 0. 001% false-acceptance rate and 1% false-rejection rate 10). Most breathing approaches for mouse-dynamics-based user authentication result in a low authentication accuracy or an unreasonably long authentication time. Either of these may limit applicability in real-world systems, because few users are willing to use an unreliable authentication mechanism, or to wait for several proceeding to log into a system.Moreover, previous studies attain favored employ data from real-world environments over experimentally controlled environments, exactly this world may cause unintended side-effects by introducing conflicting factors (e. g. , effects over referable to unalike mouse devices) that may affect experimental results. Such confounds stinkpot discover it dif? cult to attri savee experimental outcomes solely to user behavior, and not to other factors along the long path of mouse behavior, from hand to comput ing environment 21, 41. 1556-6013/$31. 00 2012 IEEE SHEN et al. USER corroboration by MOUSE dynamics 17It should be besides noted that most mouse-dynamics research apply data from both the impostors and the legitimate user to train the classi? cation or detection model. However, in the scenario of mouse-dynamics-based user authentication, usually only the data from the legitimate user are readily available, since the user would choose her speci? c sequence of mouse operations and would not share it with others. In addition, no datasets are published in previous research, which drags it dif? cult for third-party veri? cation of previous work and precludes objective comparisons betwixt unlike approaches.A. Overview of Approach Faced with the preceding(prenominal) challenges, our cogitation aims to develop a mouse-dynamics-based user authentication approach, which can perform user authentication in a short detail of time temporary hookup maintaining high accuracy. By using a controlled experimental environment, we feed isolated in presentnt behavioral characteristics as the primary factors for mouse-behavior analysis. The overview of the proposed approach is shown in Fig. 1. It consists of collar major modules (1) mouse-behavior capture, (2) feature construction, and (3) reproduction/classi? cation. The ? st module serves to create a mouse-operation task, and to capture and interpret mouse-behavior data. The second module is apply to extract holistic and procedural features to characterize mouse behavior, and to map the raw features into distance-based features by using various distance metrics. The third module, in the cooking mannequin, applies kernel PCA on the distance-based feature vectors to compute the predominant feature components, and then installs the users pro? le using a one-class classi? er. In the classi? cation phase, it determines the users identity using the trained classi? r in the distance-based feature eigenspace. B. Purpo se and Contributions of This Paper This paper is a signi? slope extension of an earlier and much shorter version 40. The main utilization and major contributions of this paper are summarized as follows We address the problem of unintended side-effects of inconsistent experimental conditions and environmental variables by restricting users mouse operations to a tightly-controlled environment. This isolates inherent behavioral characteristics as the capitulum factors in mouse behavior analysis, and advantageously reduces the effects of external confounding factors. Instead of the descriptive statistics of mouse behaviors usually adopted in existing work, we propose newly-de? ned procedural features, such as movement animate up curves, to characterize a users quaint mouse-behavior characteristics in an accurate and ? ne-grained manner. These features could go by to a feat boost both in authentication accuracy and authentication time. We apply distance metrics and kernel PCA to obtain a distance-based eigenspace for ef? ciently representing the real mouse feature space.These techniques partially handle behavioral variability, and make our proposed approach stable and robust to variability in behavior data. We employ one-class learning rules to perform the user authentication task, so that the detection model is Fig. 1. Overview of approach. reinforced solely on the data from the legitimate user. One-class methods are more suitable for mouse-dynamics-based user authentication in real-world applications. We present a repeatable and objective valuation procedure to investigate the metier of our proposed approach by means of a series of experiments.As far as we know, no earlier work make informed comparisons amongst different features and results, ascribable to the lack of a standard show protocol. Here we provide comparative experiments to besides examine the validity of the proposed approach. A public mouse-behavior dataset is established (see Section trey for availability), not only for this study but also to foster future research. This dataset contains high-quality mouse-behavior data from 37 wins. To our knowledge, this study is the ? rst to publish a divided up mouse-behavior dataset in this ? eld. This study develops a mouse-dynamics-based user authentication approach that performs user authentication in a short time while maintaining high accuracy. It has several desirable properties 1. it is easy to comprehend and implement 2. it requires no specialized hardware or equipment to capture the biometric data 3. it requires only about 12 seconds of mouse-behavior data to provide good, steady performance. The remainder of this paper is organized as follows Section II describes related work. Section III presents a data- arrangement process. Section IV describes the feature-construction process.Section V discusses the classi? cation techniques for mouse dynamics. Section VI presents the evaluation methodology. Section sevensome presents and analyzes experimental results. Section septenaryI offers a discussion and possible extensions of the current work. Finally, Section IX concludes. 18 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 1, JANUARY 2013 II. BACKGROUND AND RELATED WORK In this section, we provide compass on mouse-dynamics research, and various applications for mouse dynamics (e. g. , authentication versus intrusion detection).Then we focus on applying mouse dynamics to user authentication. A. Background of Mouse Dynamics Mouse dynamics, a behavioral biometric for analyzing behavior data from pointing devices (e. g. , mouse or touchpad), provides user authentication in an accessible and convenient manner 24, 8, 1417, 19, 21, 22, 33, 34, 3941, 45, 46. Since Everitt and McOwan 14 ? rst investigated in 2003 whether users could be august by the use of a signature written by mouse, several different techniques and uses for mouse dynamics have been proposed.Most resea rchers focus on the use of mouse dynamics for intrusion detection (sometimes called identity monitoring or reauthentication), which analyzes mouse-behavior characteristics passim the course of interaction. Pusara and Brodley 33 proposed a reauthentication scheme using mouse dynamics for user veri? cation. This study presented positive ? ndings, but cautioned that their results were only earlier. Gamboa and Fred 15, 16 were some of the earliest researchers to study identity monitoring based on mouse movements.Later on, Ahmed and Traore 3 proposed an approach combining keystroke dynamics with mouse dynamics for intrusion detection. Then they considered mouse dynamics as a standalone biometric for intrusion detection 2. Recently, Zheng et al. 46 proposed angle-based metrics of mouse movements for reauthentication systems, and explored the effects of environmental factors (e. g. , different machines). merely only recently have researchers come to the use of mouse dynamics for user au thentication (sometimes called electrostatic authentication), which analyzes mouse-behavior characteristics at particular moments.In 2007, Gamboa et al. 17 extended their approaches in identity monitoring 15, 16 into web-based authentication. Later on, Kaminsky et al. 22 presented an authentication scheme using mouse dynamics for identifying online game players. Then, Bours and Fullu 8 proposed an authentication approach by requiring users to make use of the mouse for trace a maze-like path. Most recently, a full survey of the existing work in mouse dynamics pointed out that mouse-dynamics research should focus on reducing authentication time and taking the effect of environmental variables into mark 21. B.User Authentication Based on Mouse Dynamics The primary focus of previous research has been on the use of mouse dynamics for intrusion detection or identity monitoring. It is dif? cult to shift previous work outright from intrusion detection to authentication, however, because a rather long authentication period is typically required to collect suf? cient mouse-behavior data to enable reasonably accurate veri? cation. To our knowledge, few papers have targeted the use of mouse dynamics for user authentication, which will be the central concern of this paper. Hashia et al. 19 and Bours et al. 8 presented some preliminary results on mouse dynamics for user authentication. They both asked participants to perform ? xed sequences of mouse operations, and they analyse behavioral characteristics of mouse movements to authenticate a user during the login stage. Distance-based classi? ers were established to compare the veri? cation data with the registration data. Hashia et al. collected data from 15 participants using the alike computer, while Bours et al. collected data from 28 subjects using different computers they achieved equal- delusion rates of 15% and 28% respectively.Gamboa et al. 17 presented a web-based user authentication system based on mouse dy namics. The system displayed an on-screen virtual(prenominal) keyboard, and required users to use the mouse to enter a paired username and pin-number. The extracted feature space was reduced to a best subspace through a greedy search process. A statistical model based on the Weibull distribution was built on fosterage data from both legitimate and impostor users. Based on data collected from 50 subjects, the researchers reported an equal-error rate of 6. 2%, without explicitly reporting authentication time.The attempt data were also utilise for feature selection, which may pinch to an overly optimistic estimate of authentication performance 18. Recently, Revett et al. 34 proposed a user authentication system requiring users to use the mouse to operate a graphical, combination-lock-like graphical user interface interface. A little-scale evaluation involving 6 subjects yielded an average false-acceptance rate and false-rejection rate of around 3. 5% and 4% respectively, using a d istance-based classi? er. However, experimental details such as experimental apparatus and interrogatory procedures were not explicitly reported. Aksari et al. 4 presented an authentication framework for verifying users based on a ? xed sequence of mouse movements. Features were extracted from nightspot movements among seven squares displayed consecutively on the screen. They built a classi? er based on scaled euclidean distance using data from both legitimate users and impostors. The researchers reported an equal-error rate of 5. 9% over 10 users data collected from the same computer, but authentication time was not reported. It should be noted that the higher up two studies were performed on a small number of usersonly 6 users in 34, and 10 users in 4which may be insuf? ient to evaluate de? nitively the performance of these approaches. The results of the above studies have been mixed, mayhap due to the realism of the experiments, possibly due to a lack of real differences amon g users, or possibly due to experimental errors or imperfect data. A careful reading of the literature suggests that (1) most approaches have resulted in low performance, or have employ a small number of users, but since these studies do not tend to be replicated, it is hard to pin the discrepancies on any one thing (2) no research group provided a dual-lane dataset.In our study, we control the experimental environment to increase the likelihood that our results will be free from experimental confounding factors, and we attempt to develop a simple and ef? cient user authentication approach based on mouse dynamics. We also make our data available publicly. III. MOUSE DATA ACQUISITION In this study, we collect mouse-behavior data in a controlled environment, so as to isolate behavioral characteristics as the promontory factors in mouse behavior analysis. We offer here SHEN et al. USER AUTHENTICATION THROUGH MOUSE kinetics 19 tidy detail regarding the conduct of data collection, because these particulars can best reveal potential biases and threats to experimental validity 27. Our data set is available 1. A. Controlled Environment In this study, we set up a setting computer and real a Windows application as a uniform hardware and package platform for the collection of mouse-behavior data. The desktop was an HP workstation with a Core 2 Duo 3. 0 GHz processor and 2 GB of RAM.It was equipped with a 17 HP LCD monitor (set at 1280 1024 resolution) and a USB optical mouse, and ran the Windows XP operating system. Most importantly, all system debates relating to the mouse, such as speed and sensitivity con? gurations, were ? xed. The Windows application, written in C, prompted a user to conduct a mouse-operation task. During data collection, the application displayed the task in a full-screen window on the monitor, and recorded (1) the corresponding mouse operations (e. g. , mouse-single-click), (2) the positions at which the operations occurred, and (3) the timestamps of the operations.The Windows-event clock was employ to timestamp mouse operations 28 it has a resolution of 15. 625 milliseconds, corresponding to 64 updates per second. When collecting data, each subject was invited to perform a mouse-operations task on the same desktop computer free of other subjects data collection was performed one by one on the same data-collection platform. These conditions make hardware and software factors consistent throughout the process of data collection over all subjects, thus removing unintended side-effects of unrelated hardware and software factors. B.Mouse-Operation Task Design To reduce behavioral variations due to different mouse-operation sequences, all subjects were required to perform the same sequence of mouse operations. We designed a mouse-operation task, consisting of a ? xed sequence of mouse operations, and made these operations congresswoman of a typical and diverse combination of mouse operations. The operations were selec ted fit in to (1) two elementary operations of mouse clicks single click and double click and (2) two radical properties of mouse movements movement trouble and movement distance 2, 39.As shown in Fig. 2, movement directions are numbered from 1 to 8, and each of them is selected to represent one of eightsome 45-degree ranges over 360 degrees. In addition, three distance intervals are considered to represent short-, middle- and long-distance mouse movements. Table I shows the directions and distances of the mouse movements employ in this study. During data collection, every two adjacent movements were separated by every a single click or a double click. As a whole, the designed task consists of 16 mouse movements, 8 single clicks, and 8 double clicks.It should be noted that our task may not be unique. However, the task was carefully chosen to induce users to perform a wide florilegium of mouse movements and clicks that were both typical and diverse in an individuals repertoire of daily mouse behaviors. 1The mouse-behavior dataset is available from http//nskeylab. xjtu. edu. cn/ projects/mo applyynamics/behavior-data-set/. Fig. 2. Mouse movement directions sector 1 covers all operations performed degrees and degrees. with angles amid add-in I MOUSE MOVEMENTS IN THE DESIGNED MOUSE-OPERATION TASK C.Subjects We recruited 37 subjects, many from within our lab, but some from the university at large. Our sample of subjects consisted of 30 males and 7 females. every(prenominal) of them were right-handed users, and had been using a mouse for a nominal of two years. D. Data-Collection Process All subjects were required to participate in two rounds of data collection per day, and waited at least 24 hours amid collections (ensuring that some day-to-day variation existed within the data). In each round, each subject was invited, one by one, to perform the same mouse-operation task 10 times.A mouse-operation sample was obtained when a subject performed the task on e time, in which she ? rst clicked a start button on the screen, then moved the mouse to click subsequent buttons prompted by the data-collection application. Additionally, subjects were instructed to use only the external mouse device, and they were advised that no keyboard would be needed. Subjects were told that if they needed a break or needed to stretch their hands, they were to do so after they had established a full round. This was intended to prevent arti? cially anomalous mouse operations in the middle of a task.Subjects were admonished to focus on the task, as if they were logging into their own accounts, and to avoid distractions, such as talking with the experimenter, while the task was in progress. Any error in the operating process (e. g. , single-clicking a button when requiring double-clicking it) ca employ the current task to be reset, requiring the subject to redo it. 20 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 1, JANUARY 2013 circuit card II MOUSE DYNAMICS FEATURES Subjects took between 15 days and 60 days to complete data collection. all(prenominal) subject accomplished 150 error-free repetitions of the same mouse-operation task. The task took between 6. 2 seconds and 21. 3 seconds, with an average of 11. 8 seconds over all subjects. The ? nal dataset contained 5550 samples from 37 subjects. IV. FEATURE CONSTRUCTION In this section, we ? rst extract a set of mouse-dynamics features, and then we use distance-measurement methods to obtain feature-distance vectors for reducing behavioral variability. Next, we utilize an eigenspace transformation to extract principal feature components as classi? er insert. A.Feature Extraction The data collected in Section III are sequences of mouse operations, including left-single-clicks, left-double-clicks, and mouse-movements. Mouse features were extracted from these operations, and were typically organized into a vector to represent the sequence of mouse operations in one ex ecution of the mouse-operation task. Table II summarizes the derived features in this study. We characterized mouse behavior based on two basic eccentric persons of mouse operationsmouse click and mouse movement. Each mouse operation was then analyzed individually, and translated into several mouse features.Our study divided these features into two categories Holistic features features that characterize the overall properties of mouse behaviors during interactions, such as single-click and double-click statistics Procedural features features that depict the detailed dynamic processes of mouse behaviors, such as the movement speed and acceleration curves. Most traditional features are holistic features, which suf? ce to obtain a statistical commentary of mouse behavior, such as the mean value of click times. They are easy to compute and comprehend, but they only characterize general attributes of mouse behavior.In our study, the procedural features characterize in-depth procedura l details of mouse behavior. This information more accurately re? ects the ef? ciency, agility and motion habits of individual mouse users, and thus may lead to a performance boost for authentication. Experimental results in Section VII demonstrate the effectiveness of these newly-de? ned features. B. Distance Measurement The raw mouse features cannot be employ directly by a classi? er, because of high dimensionality and behavioral variability. Therefore, distance-measurement methods were applied to obtain feature-distance vectors and to mitigate the effects of these issues.In the calculation of distance measurement, we ? rst used the Dynamic Time buckle (DTW) distance 6 to compute the distance vector of procedural features. The reasons for this choice are that (1) procedural features (e. g. , movement speed curve) of two data samples are not likely to consist of the exactly same number of points, whether these samples are generated by the same or by different subjects (2) DTW dist ance can be applied directly to measure the distance between the procedural features of two samples without deforming either or both of the two sequences in order to get an equal number of points.Next, we applied Manhattan distance to calculate the distance vector of holistic features. The reasons for this choice are that (1) this distance is independent between dimensions, and can preserve physical interpretation of the features since its count is the absolute value of cumulative difference (2) previous research in related ? elds (e. g. , keystroke dynamics) reported that the use of Manhattan distance for statistical features could lead to a better performance 23. ) Reference Feature transmitter genesis We established the reference feature vector for each subject from her dressing feature vectors. Let , be the training set of feature vectors for one subject, where is a -dimensional mouse feature vector extracted from the th training sample, and is the number of training samples. Consider how the reference feature vector is generated for each subject Step 1 we computed the pairwise distance vector of procedural features and holistic features between all pairs of training feature vectors and .We used DTW distance to calculate the distance vector of procedural features for measuring the similarity between the procedural components of the two feature vectors, and we applied Manhattan distance to calculate the distance vector of holistic features . (1) where , and represents the procedural components of represents the holistic components. SHEN et al. USER AUTHENTICATION THROUGH MOUSE DYNAMICS 21 Step 2 we concatenated the distance vectors of holistic features and procedural features unneurotic to obtain a distance vector for the training feature vectors and by (2) Step 3 we normalized vector to get a scale-invariant feature nd sample covariance . Then we can obtain the mean of such a training set by (5) (6) (3) is the mean of all where pairwise distance vecto rs from the training set, and is the corresponding standard deviation. Step 4 for each training feature vector, we calculated the arithmetic mean distance between this vector and the rest training vectors, and found the reference feature vector with minimum mean distance. (4) 2) Feature-Distance Vector Calculation Given the reference feature vector for each subject, we then computed the feature-distance vector between a new mouse feature vector and the reference vector.Let be the reference feature vector for one subject then for any new feature vector (either from the legitimate user or an impostor), we can compute the corresponding distance vector by (1), (2) and (3). In this paper, we used all mouse features in Table II to generate the feature-distance vector. There are 10 click-related features, 16 distance-related features, 16 time-related features, 16 speed-related features, and 16 acceleration-related features, which were taken together and then transform to a 74-dimensional feature-distance vector that represents each mouse-operation sample. C.Eigenspace Computation Training and task It is usually undesirable to use all components in the feature vector as stimulant for the classi? er, because much of data will not provide a signi? cant degree of uniqueness or consistency. We therefore applied an eigenspace-transformation technique to extract the principal components as classi? er input. 1) Kernel PCA Training Kernel principal component analysis (KPCA) 37 is one approach to generalizing linear PCA to nonlinear cases using kernel methods. In this study, the purpose of KPCA is to obtain the principal components of the original feature-distance vectors.The calculation process is illustrated as follows For each subject, the training set represents a set of feature-distance vectors drawn from her own data. Let be the th feature-distance vector in the training set, and be the number of such vectors. We ? rst mapped the measured vectors into the hyperdimensi onal feature space by the nonlinear mapping Here we touch the mapped point with the corresponding mean as . The principal components were then computed by solving the eigenvalue problem (7) where and . Then, by de? ning a kernel matrix (8) we computed an eigenvalue problem for the coef? ients is now solely dependent on the kernel assist , that (9) For details, readers can refer to B. Scholkopf et al. 37. Generally speaking, the ? rst few eigenvectors correspond to large eigenvalues and most information in the training samples. Therefore, for the sake of providing the principal components to represent mouse behavior in a low-dimensional eigenspace, and for memory ef? ciency, we ignored small eigenvalues and their corresponding eigenvectors, using a verge value (10) is the accumulated variance of the ? st largest eigenwhere values with respect to all eigenvalues. In this study, was chosen as 0. 95 for all subjects, with a range from 0 to 1. assembly line that we used the same for d ifferent subjects, so may be different from one subject to another. Speci? cally, in our experiments, we sight that the number of principal components for different subjects varied from 12 to 20, and for an average level, 17 principal components are identi? ed under the threshold of 0. 95. 2) Kernel PCA Projection For the selected subject, taking the largest eigenvalues and he associated eigenvectors, the transform matrix can be constructed to project an original feature-distance vector into a point in the -dimensional eigenspace (11) As a result, each subjects mouse behavior can be mapped into a manifold trajectory in such a parametric eigenspace. It is wellknow that is usually much little than the dimensionality of the original feature space. That is to say, eigenspace analysis can dramatically reduce the dimensionality of input samples. In this way, we used the extracted principal components of the feature-distance vectors as input for subsequent classi? ers. 22IEEE TRANSACTION S ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 1, JANUARY 2013 V. CLASSIFIER IMPLEMENTATION This section explains the classi? er that we used, and introduces two other widely-used classi? ers. Each classi? er analyzes mouse-behavior data, and discriminates between a legitimate user and impostors. A. One-Class Classi? er Overview User authentication is still a contest task from the pattern-classi? cation perspective. It is a two-class (legitimate user versus impostors) problem. In the scenario of mouse-dynamicsbased user authentication, a login user is required to provide the user name and to perform a speci? mouse-operation task which would be secret, like a password. Each user would choose her own mouse-operations task, and would not share that task with others. Thus, when building a model for a legitimate user, the only behavioral samples of her speci? c task are her own other users (considered as impostors in our scenario) samples of this task are not readily available. In this scenario, therefore, an book solution is to build a model based only on the legitimate users data samples, and use that model to detect impostors. This type of problem is known as one-class classi? ation 43 or novelty/anomaly detection 25, 26. We thus focused our attention on this type of problem, especially because in a real-world situation we would not have impostor renditions of a legitimate users mouse operations anyway. B. Our Classi? erOne-Class Support Vector Machine Traditional one-class classi? cation methods are lots unsatisfying, frequently missing some true positives and producing too many false positives. In this study, we used a one-class Support Vector Machine (SVM) classi? er, introduced by Scholkopf et al. 36, 38. One-class SVMs have been successfully applied to a number of real-life classi? ation problems, e. g. , face authentication, signature veri? cation and keystroke authentication 1, 23. In our context, given training samples belonging to one subject, , each sample has features (corresponding to the principal components of the feature-distance vector for that subject). The aim is to ? nd a hyperplane that separates the data points by the largest margin. To separate the data points from the origin, one inevitably to solve the following dual quadratic programming problem 36, 38 the origin, and is the kernel function. We allow for nonlinear close boundaries. Then the end function 13) will be positive for the examples from the training set, where is the offset of the decision function. In essence, we viewed the user authentication problem as a one-class classi? cation problem. In the training phase, the learning task was to build a classi? er based on the legitimate subjects feature samples. In the exam phase, the test feature sample was projected into the same high-dimensional space, and the output of the decision function was recorded. We used a radial basis function (RBF) in our evaluation, after comparative studies of linear, polynomial, and sigmoid kernels based on classi? ation accuracy. The SVM parameter and kernel parameter (using LibSVM 11) were set to 0. 06 and 0. 004 respectively. The decision function would generate if the authorized users test set is input otherwise it is a false rejection case. On the contrary, should be obtained if the impostors test set is the input otherwise a false acceptance case occurs. C. Other Classi? ersNearest Neighbor and Neural Network In addition, we compared our classi? er with two other widely-used classi? ers, KNN and neural network 12. For KNN, in the training phase, the closest populate classi? r estimated the covariance matrix of the training feature samples, and saved each feature sample. In the testing phase, the nearest neighbor classi? er calculated Mahalanobis distance from the new feature sample to each of the samples in the training data. The average distance, from the new sample to the nearest feature samples from the training data, was used as the anomaly score. after multiple tests with ranging from 1 to 5, we obtained the best results with , detailed in Section VII. For the neural network, in the training phase a network was built with input nodes, one output node, and hidden nodes.The network weights were randomly initialized between 0 and 1. The classi? er was trained to produce a 1. 0 on the output node for every training feature sample. We trained for 1000 epochs using a learning rate of 0. 001. In the testing phase, the test sample was run through the network, and the output of the network was recorded. Denote to be the output of the network intuitively, if is close to 1. 0, the test sample is similar to the training samples, and with close to 0. 0, it is dissimilar. VI. EVALUATION METHODOLOGY This section explains the evaluation methodology for mouse behavior analysis.First, we summarize the dataset collected in Section III. Next, we set up the training and testing procedure for our one-class classi? ers. Then, w e show how classi? er performance was calculated. Finally, we introduce a statistical testing method to further analyze experimental results. (12) where is the vector of nonnegative Lagrangian multipliers to be determined, is a parameter that controls the trade-off between maximizing the number of data points contained by the hyperplane and the distance of the hyperplane from SHEN et al. USER AUTHENTICATION THROUGH MOUSE DYNAMICS 23A. Dataset As discussed in Section III, samples of mouse-behavior data were collected when subjects performed the designed mouseoperation task in a tightly-controlled environment. All 37 subjects produced a heart of 5550 mouse-operation samples. We then calculated feature-distance vectors, and extracted principal components from each vector as input for the classi? ers. B. Training and Testing Procedure Consider a scenario as mentioned in Section V-A. We started by designating one of our 37 subjects as the legitimate user, and the rest as impostors. We trained the classi? er and ested its ability to recognize the legitimate user and impostors as follows Step 1 We trained the classi? er to build a pro? le of the legitimate user on a randomly-selected half of the samples (75 out of 150 samples) from that user. Step 2 We tried and true the ability of the classi? er to recognize the legitimate user by calculating anomaly pull ahead for the remaining samples generated by the user. We designated the scores assigned to each sample as genuine scores. Step 3 We tried and true the ability of the classi? er to recognize impostors by calculating anomaly scores for all the samples generated by the impostors.We designated the scores assigned to each sample as impostor scores. This process was then repeated, designating each of the other subjects as the legitimate user in turn. In the training phase, 10-fold cross validation 24 was employed to choose parameters of the classi? ers. Since we used a random sampling method to divide the data into training and testing sets, and we wanted to account for the effect of this randomness, we repeated the above procedure 50 times, each time with independently selected samples drawn from the entire dataset. C. Calculating Classi? r Performance To convert these sets of classi? cation scores of the legitimate user and impostors into aggregate measures of classi? er performance, we computed the false-acceptance rate ( farther) and false-rejection rate (FRR), and used them to generate an ROC curve 42. In our evaluation, for each user, the outlying(prenominal) is calculated as the ratio between the number of false acceptances and the number of test samples of impostors the FRR is calculated as the ratio between the number of false rejections and the number of test samples of legitimate users.Then we computed the average farthermost and FRR over all subjects. Whether or not a mouse-operation sample generates an alarm depends on the threshold for the anomaly scores. An anomaly score over the threshold indicates an impostor, while a score under the threshold indicates a legitimate user. In many cases, to make a user authentication scheme deployable in practice, minimizing the possibility of rejecting a true user (lower FRR) is sometimes more important than lowering the probability of accepting an impostor 46. Thus we adjusted the threshold according to the FRR for the training data.Since calculation of the FRR requires only the legitimate users data, no impostor data was used for determining the threshold. Speci? cally, the threshold is set to be a variable ranging from , and will be chosen with a relatively low FRR using 10-fold cross validation on the training data. After multiple tests, we witness that setting the threshold to a value of 0. 1 yields a low FRR on average2. Thus, we show results with a threshold value of 0. 1 throughout this study. D. Statistical Analysis of the Results To evaluate the performance of our approach, we developed a statistical test u sing the half total error rate (HTER) and con? ence-interval (CI) evaluation 5. The HTER test aims to statistically evaluate the performance for user authentication, which is de? ned by combining false-acceptance rate (FAR) and falserejection rate (FRR) (14) Con? dence intervals are computed around the HTER as , and and are computed by 5 (15) % % % (16) where NG is the total number of genuine scores, and NI is the total number of impostor scores. VII. EXPERIMENTAL RESULTS AND ANALYSIS Extensive experiments were carried out to verify the effectiveness of our approach. First, we performed the authentication task using our approach, and compared it with two widely-used classi? rs. Second, we examined our primary results concerning the effect of eigenspace transformation methods on classi? er performance. Third, we explored the effect of sample length on classi? er performance, to investigate the trade-off between security and usability. Two additional experiments are provided to compa re our method with other approaches in the literature. A. Experiment 1 User Authentication In this section, we conducted a user authentication experiment, and compared our classi? er with two widely-used ones as mentioned in Section V-C. The data used in this experiment consisted of 5550 samples from 37 subjects.Fig. 3 and Table III show the ROC curves and average FARs and FRRs of the authentication experiment for each of three classi? ers, with standard deviations in parentheses. Table III also includes the average authentication time, which is the sum of the average time needed to collect the data and the average time needed to make the authentication decision (note that since the latter(prenominal) of these two times is always less than 0. 003 seconds in our classi? ers, we ignore it in this study). Our ? rst notification is that the best performance has a FAR of 8. 74% and a FRR of 7. 96%, obtained by our approach (one-class SVM).This result is promising and competitive, and th e behavioral samples are captured over a much shorter period of time 2Note that for different classi? ers, there are different threshold intervals. For instance, the threshold interval for neural network detector is 0, 1, and for one. For uniform presentation, we mapped all of intervals class SVM, it is . to 24 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 8, NO. 1, JANUARY 2013 TABLE IV HTER PERFORMANCE AND CONFIDENCE INTERVAL AT CONFIDENCE LEVELS DIFFERENT Fig. 3. ROC curves for the three different classi? rs used in this study oneclass SVM, neural network, and nearest neighbor. TABLE III FARs AND FRRs OF USER AUTHENTICATION EXPERIMENT (WITH STANDARD DEVIATIONS IN PARENTHESES) information about mouse behavior, which could arouse performance. Finally, we conducted a statistical test, using the HTER and CI evaluation as mentioned in Section VI-D, to statistically evaluate the performance of our approach. Table IV summarizes the results of this statistical evaluation at different con? dence levels. The result shows that the proposed approach provides the lowest HTER in comparison with the other two classi? ers used in our study the 95% con? ence interval lies at % %. B. Experiment 2 Effect of Eigenspace Transformation This experiment examined the effect of eigenspace-transformation methods on classi? er performance. The data used were the same as in Experiment 1. We applied a one-class SVM classi? er in three evaluations, with the inputs respectively set to be the original feature-distance vectors (without any transformations), the projection of feature-distance vectors by PCA, and the projection of feature-distance vectors by KPCA. Fig. 4 and Table V show the ROC curves and average FARs and FRRs for each of three feature spaces, with standard deviations in parentheses.As shown in Fig. 4 and Table V, the authentication accuracy for the feature space transformed by KPCA is the best, followed by the accuracies for feature spaces by PCA and the or iginal one. Speci? cally, direct classi? cation in the original feature space (without transformations) produces a FAR of 15. 45% and FRR of 15. 98%. This result is not encouraging compared to results previously reported in the literature. However, as mentioned in Experiment 1, the samples may be subject to more behavioral variability compared with previous work, because previous work analyzed mouse behaviors over a longer period of observation.Moreover, we mention that the authentication results of % % by PCA, and % % by KPCA are much better than for direct classi? cation. This result is a demonstration of the effectiveness of the eigenspace transformation in dealing with variable behavior data. Furthermore, we ? nd that the performance of KPCA is slightly superior to that of PCA. This may be due to the nonlinear variability (or noise) existing in mouse behaviors, and KPCA can reduce this variability (or noise) by using kernel transformations 29.It is also of note that the standar d deviations of FAR and FRR based on the feature space transformed by KPCA and PCA are smaller than those of the original feature space (without transformations), indicating that the eigenspace-transformation technique enhances the stability and robustness of our approach. compared with previous work. It should be noted that our result does not yet meet the European standard for commercial biometric technology, which requires near-perfect accuracy of 0. 001% FAR and 1% FRR 10. But it does demonstrate that mouse dynamics could provide valuable information in user authentication tasks.Moreover, with a series of incremental improvements and investigations (e. g. , outlier handling), it seems possible that mouse dynamics could be used as, at least, an auxiliary authentication technique, such as an enhancement for conventional password mechanisms. Our second observation is that our approach has substantially better performance than all other classi? ers considered in our study. This may be due to the fact that SVMs can convert the problem of classi? cation into quadratic optimization in the case of relative insuf? ciency of anterior knowledge, and still maintain high accuracy and stability.In addition, the standard deviations of the FAR and FRR for our approach are much smaller than those for other classi? ers, indicating that our approach may be more robust to variable behavior data and different parameter selection procedures. Our third observation is that the average authentication time in our study is 11. 8 seconds, which is impressive and achieves an acceptable level of performance for a unimaginative application. Some previous approaches may lead to low availability due to a relatively-long authentication time. However, an authentication time of 11. seconds in our study shows that we can perform mouse-dynamics analysis quickly enough to make it applicable to authentication for most login processes. We conjecture that the signi? cant decrease of authenticati on time is due to procedural features providing more detailed and ? ne-grained SHEN et al. USER AUTHENTICATION THROUGH MOUSE DYNAMICS 25 TABLE VI FARs AND FRRs OF DIFFERENT SAMPLE LENGTHS Fig. 4. ROC curves for three different feature spaces the original feature space, the projected feature space by PCA, and the projected feature space by KPCA.TABLE V FARs AND FARs FOR THREE DIFFERENT FEATURE SPACES (WITH STANDARD DEVIATIONS IN PARENTHESES) the needs of the European Standard for commercial biometric technology 10. We ? nd that after observing 800 mouse operations, our approach can obtain a FAR of 0. 87% and a FRR of 0. 69%, which is very close to the European standard, but with a corresponding authentication time of about 10 proceeding. This long authentication time may limit applicability in real systems. Thus, a trade-off must be made between security and user acceptability, and more nvestigations and improvements should be performed to secure a place for mouse dynamics in more p ragmatic settings. D. likeness User authentication through mouse dynamics has attracted growing interest in the research community. However, there is no shared dataset or baseline algorithm for measuring and determining what factors affect performance. The unavailability of an received common dataset (such as the FERET database in face recognition 32) and standard evaluation methodology has been a limitation in the development of mouse dynamics.Most researchers trained their models on different feature sets and datasets, but none of them made informed comparisons among different mouse feature sets and different results. Thus two additional experiments are offered here to compare our approach with those in the literature. 1) Comparison 1 Comparison With Traditional Features As stated above, we constructed the feature space based on mouse clicks and mouse movements, consisting of holistic features and procedural features. To further examine the effectiveness of the features constru cted in this study, we provide a comparative experiment. We chose the features used by Gamboa et al. 17, Aksari and Artuner 4, Hashia et al. 19, Bours and Fullu 8, and Ahmed and Traore 2, because they were among the most frequently cited, and they represented a relatively diverse set of mouse-dynamics features. We then used a one-class SVM classi? er to conduct the authentication experiment again on our same dataset with both the feature set de? ned in our study, and the feature sets used in other studies. Hence, the authentication accuracies of different feature sets can be compared. Fig. 5 and Table VII show the ROC curves and average FARs and FRRs for each of six feature sets, with standard deviations in parentheses.We can see that the average error rates for the feature set from our approach are much lower than those of the feature sets from the literature. We conjecture that this may be due to the procedural features providing ? ne-grained information about mouse behavior, but they may also be due, in part, to (1) partial adoption of features de? ned in previous approaches C. Experiment 3 Effect of Sample space This experiment explored the effect of sample length on classi? er performance, to investigate the trade-off between security (authentication accuracy) and usability (authentication time).In this study, the sample length corresponds to the number of mouse operations needed to form one data sample. Each original sample consists of 32 mouse operations. To explore the effect of sample length on the performance of our approach, we derived new datasets with different sample lengths by applying bootstrap sampling techniques 13 to the original dataset, to make derived datasets containing the same numbers racket of samples as the original dataset. The new data samples were generated in the form of multiple consecutive mouse samples from the original dataset. In this way, we considered classi? r performance as a function of the sample length using all boot strap samples derived from the original dataset. We conducted the authentication experiment again (using one-class SVM) on six derived datasets, with and 800 operations. Table VI shows the FARs and FRRs at varying sample lengths, using a one-class SVM classi? er. The table also includes the authentication time in seconds. The FAR and FRR obtained using a sample length of 32 mouse operations are 8. 74% and 7. 96% respectively, with an authentication time of 11. 8 seconds. As the number of operations increases, the FAR and FRR drop to 6. 7% and 6. 68% for the a data sample comprised of 80 mouse operations, corresponding to an authentication time of 29. 88 seconds. Therefore, we may conclude that classi? er performance almost certainly gets better as the sample length increases. Note that 60 seconds may be an upper bound for authentication time, but the corresponding FAR of 4. 69% and FRR of 4. 46% are still not low enough to meet 26 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECUR ITY, VOL. 8, NO. 1, JANUARY 2013 Fig. 5. ROC curves for six different feature sets the feature set in our study, and the features sets in other studies.RESULTS OF TABLE VII COMPARISON WITH SOME TRADITIONAL FEATURES (WITH STANDARD DEVIATIONS IN PARENTHESES) Note that this approach 2 is initially applied to intrusion detection, and we extracted parts of features closely related to mouse operations in our dataset. The reason for this decision is that we want to examine whether the features employed in intrusion detection can be used in user authentication. because of different data-collection environments (2) using different types of thresholds on the anomaly scores (3) using less enrollment data than was used in previous experiments.The improved performance based on using our features also indicates that our features may allow more accurate and detailed characterization of a users unique mouse behavior than was possible with previously used features. Another thing to note from Table V II is that the standard deviations of error rates for features in our study are smaller than those for traditional features, suggesting that our features might be more stable and robust to variability in behavior data. One may also curiosity how much of the authentication accuracy of our approach is due to the use of procedural features or holistic features.We tested our method using procedural features and holistic features separately, and the set of procedural features was the choice that proved to perform better. Specifically, we observe that the authentication accuracy of % % by using the set of procedural features is much better than for the set of holistic features, which have a FAR of 19. 58% and a FRR of 17. 96%. In combination with the result when using all features, it appears that procedural features may be more stable and discriminative than holistic features, which suggests that the procedural features endure more to the authentication accuracy.The results here only p rovide preliminary comparative results and should not be used to conclude that a certain set of mouse features is always better than others. Each feature set has its own unique advantages and disadvantages under different conditions and applications, so further evaluations and comparisons on more realistic and challenging datasets are needed. 2) Comparison 2 Comparison With Previous Work Most previous approaches have either resulted in poor performance (in terms of authentication accuracy or time), or have used data of limited size.In this section, we show a qualitative comparison of our experimental results and settings against results of previous work (listed in Table VIII). Revett et al. 34 and Aksari and Artuner 4 considered mouse dynamics as a standalone biometric, and obtained an authentication accuracy of ERR around 4% and 5. 9% respectively, with a relatively-short authentication time or small number of mouse operations. But their results were based on a small pool of users (6 users in 34 and 10 users in 4), which may be insuf? ient to obtain a good, steady result. Our study relies on an improved user authentication methodology and far more users, leading us to achieve a good and robust authentication performance. Ahmed and Traore 2 achieved a high authentication accuracy, but as we mentioned before, it might be dif? cult to use such a method for user authentication since the authentication time or the number of mouse operations needed to verify a users identity is too high to be practical for real systems. Additionally, Hashia et al. 19 and Bours and Fulla 8 could perform user authentication in a relatively-short time, but they reported unacceptably high error rates (EER of 15% in 19, and EER of 26. 8% in 8). In our approach we can make an authentication decision with a reasonably short authentication time while maintaining high accuracy. We employ a one-class classi? er, which is more appropriate for mouse-dynamics-based user authentication. As menti oned in Experiment 3, we can make an authentication decision in less than 60 seconds, with corresponding error rates are FAR of 4. 9% and FRR of 4. 46%. Although this result could be improved, we believe that, at our current performance level, mouse dynamics suf? ce to be a practical auxiliary authentication mechanism. In summary, Comparison 1 shows that our proposed features outperform some traditional features used in previous studies, and may be more stable and robust to variable behavior data. Comparison 2 indicates that our approach is competitive with existing approaches in authentication time while maintaining high accuracy.More detailed statistical studies on larger and more realistic datasets are desirable for further evaluations. VIII. DISCUSSION AND EXTENSION FOR proximo WORK Based on the ? ndings from this study, we take away some messages, each of which may suggest a trajectory for future work. Additionally, our work highlights the need for shared data and resources. A . Success Factors of Our Approach The presented approach achieved a short authentication time and relatively-high accuracy for mouse-dynamics-based user SHEN et al. USER AUTHENTICATION THROUGH MOUSE DYNAMICS 27 TABLE VIII COMPARISON WITH PREVIOUS WORKAuthentication time was not explicitly reported in 4, 8, 17 instead, they required the user to accomplish a number of mouse operations for each authentication (15 clicks and 15 movements for 17 10 clicks and 9 movements for 4 18 short movements without pauses for 8). Authentication time was not explicitly stated in 2 however, it can be assumed by data-collection progress. For example, it is stated in 2 that an average of 12 hours 55 minutes of data were captured from each subject, representing an average of 45 sessions. We therefore assume that average session length is 12. 5 60/45 17. 22 minutes 1033 seconds. authentication. However, it is quite hard to point out one or two things that may have made our results better than those of pr evious work, because (1) past work favored realism over experimental control, (2) evaluation methodologies were inconsistent among previous work, and (3) there have been no public datasets on which to perform comparative evaluations. Experimental control, however, is likely to be responsible for much of our success. Most previous work does not reveal any particulars in controlling experiments, while our work is tightly controlled.We made every effort to control experimental confounding factors to prevent them from having unintended in? uence on the subjects recorded mouse behavior. For example, the same desktop computer was used for data collection for all subjects, and all system parameters relating to the mouse were ? xed. In addition, every subject was provided with the same instructions. These settings suggest strongly that the differences in subjects were due to individually detectable mouse-behavior differences among subjects, and not to environmental variables or experimental conditions.We strongly advocate the control of potential confounding factors in future experiments. The reason is that controlled experiments are necessary to reveal causal connections among experimental factors and classi? er performance, while realistic but uncontrolled experiments may introduce confounding factors that could in? uence experimental outcomes, which would make it hard to tell whether the results of those evaluations actually re? ect detectable differences in mouse behavior among test subjects, or differences among computing environments.We had more subjects (37), more repetitions of the operation task (150), and more comprehensive mouse operations (2 types of mouse clicks, 8 movement directions, and 3 movement distance ranges) than most studies did. Larger subject pools, however, sometimes make things harder when there are more subjects there is a higher possibility that two subjects will have similar mouse behaviors, resulting in more classi? cation errors. We pro posed the use of procedural features, such as the movement speed curve and acceleration curve, to provide more ? egrained information about mouse behavior than some traditional features. This may allow one to accurately describe a users unique mouse behavior, thus leading to a performance improvement for mouse-dynamics-based user authentication. We adopted methods for distance measurement and eigenspace transformation for obtaining principal feature components to ef? ciently represent the original mouse feature space. These methods not only belabor within-class variability of mouse behavior, but also preserve between-class differences of mouse behavior. The improved authentication accuracies demonstrate the ef? acy of these methods. Finally, we used a one-class learning algorithm to perform the authentication task, which is more appropriate for mousedynamics-based user authentication in real applications. In general, until there is a comparative study that stabilizes these factors, it will be hard to be de? nitive about the precise elements that made this work successful. B. Opportunities for Improvement While previous studies showed promising results in mouse dynamics, none of them have been able to meet the requirement of the European standard for commercial biometric technology.In this work, we determined that mouse dynamics may achieve a pragmatically useful level of accuracy, but with an impractically long authentic

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.