2012 Contextual Collaborative Filtering via Hierarchical Matrix Factorization

2020-03-01 61浏览

1.Downloaded 06/12/18 to 218.249.50.169. Redistribution subject to SIAM license or copyright; seehttp://www.siam.org/journals/ojsa.phpContextual Collaborative Filtering via Hierarchical Matrix Factorization Erheng Zhong ∗, Wei Fan†and Qiang Yang∗ preferences. Among different learning methods, matrix factorization (MF) is generally considered to be a competitive method and there has been much interests since Netflix has accounted its competition winner. Based on announced data, MF models the relationship between users and items by decomposing the rating matrix into low-rank factor matrices and uncovers the latent relationship between users and items. Specifically, the latent factors indicate both users’ interest distribution and items’ membership over latent topics. Despite their successful applications, MF methods suffer from one majordrawback:the rating values in the matrix are assumed to be generated uniformly, such that a user (or an item) generates all his ratings using the same factor vector, without taking specific contexts into account. In other words, MFs do not consider twofacts:(1) a user’s ratings can be influenced by multiple factors, such as mood, environment, time of day, etc; (2) items with similar properties would receive similar ratings, such as the ratings of comedy and dramatic movies. To solve the first problem, SVD++ [11] adds constraints from implicit feedbacks on users’ factor vectors, and M3 F [15] introduces context dependence rating prediction by allowing each item to select a new topic for each new interaction. However, all of these solutions inadvertently introduce heavy computational costs and become rather inefficient when the dataset is large. In addition, none of these approaches solve the second problem. In this paper, we propose a new matrix factorizationmodel:Random Partition Matrix Factorization (RPMF), based on a tree structure constructed by using an efficient random partition technique adopted from Random Decision Trees (RDT) [4], to provide a fast solution to the problems discussed above. While pre1 Introduction vious matrix factorization models generate their data Recommender systems are often based on collaborative uniformly, the proposed contextual RPMF model genfiltering, where we observe m users, n items, and a raterate data by region in a hierarchical manner. In other R ing matrix R = (uk ; ik ; rk )k=1 with real-valued ratings words, the rating values in the matrix generated with rk . In this formulation, rk represents the preferences either similar contexts or items are partitioned locally of certain users uk for some items ik . The objective unto the same node of a decision tree. Afterwards, is to predict unobserved ratings based on users’ past we explore low-rank approximation to the current subrating-matrix at each node. Therefore, the ratings at ∗ Hong Kong University of Science and Technology. tree nodes give rise to different factor vectors to handle {ezhong,qyang}@cse.ust.hk. specific contexts, and thus ratings with similar items at † IBM T.J.Watson Research Center. weifan@us.ibm.com. 1http://www.cse.ust.hk/ezhong/code/sdm12hmf.zip the same node can provide more impact on each other. ~ Abstract Matrix factorization (MF) has been demonstrated to be one of the most competitive techniques for collaborative filtering. However, state-of-the-art MFs do not consider contextual information, where ratings can be generated under different environments. For example, users select items under various situations, such as happy mood vs. sad, mobile vs. stationary, movies vs. book, etc. Under different contexts, the preference of users are inherently different. The problem is that MF methods uniformly decompose the rating matrix, and thus they are unable to factorize for different contexts. To amend this problem and improve recommendation accuracy, we introduce a “hierarchical” factorization model by considering the local context when performing matrix factorization. The intuition isthat:as ratings are being generated from heterogeneous environments, certain user and item pairs tend to be more similar to each other than others, and hence they ought to receive more collaborative information from each other. To take the contextual information into consideration, the proposed “contextual collaborative filtering” approach splits the rating matrix hierarchically by grouping similar users and items together, and factorizes each sub-matrix locally under different contexts. By building an ensemble model, the approach further avoids over-fitting with less parameter tuning. We analyze and demonstrate that the proposed method is a model-averaging gradient boosting model, and its error rate can be bounded. Experimental results show that it outperforms three state-of-the-art algorithms on a number of real-world datasets (MovieLens, Netflix, etc). The source code and datasets are available for download1 . 744 35 Copyright © SIAM. Unauthorized reproduction of this article is prohibited.
2.2.1 Matrix Factorization Models Matrix factorization methods [11] represent state-of-the-art for collaborative filtering tasks. They learn a constrained decomposition of the rating matrix as a product of two low-rank latentmatrices:R ≈ U V T where R ∈ Nm×n is the rating matrix, U ∈ Rm×d is the latent matrix representing users and V ∈ Rm×d is the parallel definition for items. The basic matrix factorization is as follows. (2.1) Downloaded 06/12/18 to 218.249.50.169. Redistribution subject to SIAM license or copyright; seehttp://www.siam.org/journals/ojsa.phpTable 1: Definition of notations Notation R I U V m n k FE N h {1, . . . , Q} Notation Description Rating matrix Index matrix to indicate non-zeros in R Factor matrix of users Factor matrix of items Number of users Number of items Number of latent factors Ensemble Model Number of trees Height of trees Rating scope arg min k (R − U V T ) U,V As such, the latent features implied in the low-rank matrices are explored to partition the rating matrix at the current node. In addition, we take advantage of the efficiency of random partition and its generalizability, and exploit ensemble to effectively reduce the problem of over-fitting. As discussed in Section 4, this process is a model-averaging gradient boosting model, and thus has good performance guarantee. As analyzed, the prediction model is not sensitive to the rank of the latent matrices - a problem associated with many of the stateof-the-art matrix factorization methods. In the prediction process, when a new user-item pair arrives, we pass the pair from the root to the leaf to smooth the predictions at each node on this decision path. We then combine the predictions from each tree in the ensemble together to generate the final prediction. This process is efficient because there are very few iterations required in order for the gradient algorithm to converge towards leaf level. We also demonstrate that its complexity is comparatively smaller than other state-of-the-art methods. A brief summary of the paper is asfollows:I k2f +λ · k U kf + k V kf where I is the index matrix to indicate the non-zero elements in R, represents element-wise multiplication and k ∗ kf is the Frobenius norm and used to regulate the complexity of U and V , and λ is the trade-off parameter. In practice, we explore stochastic gradient descent (SGD) to compute U and V . To obtain accurate results, SVD++ [11] integrates the implicit feedback and preferencebias: X 1 Yv VjT (2.2) Rij = bij + Ui + N (i) − 2 v∈N (i) where bij is the bias for user i on item j, N (i) is the number of implicit feedbacks obtained from user i, and Yv is the feedback factor vector of item v. However, these methods need to set the rank of the latent matrices, d, carefully to avoid over-fitting. [20] explores Bayesian method to avoid this and uses MCMC to estimate the parameters. To take the advantage of kernel method, [12] proposes a non-linear matrix factorization based on Gaussian process latent variable models (GP-LVM). The common drawback of these methods is that the computation complexity is too high when the dataset is large and makes them impractical. To utilize the auxiliary knowledge, such as the rat1. A novel hierarchical matrix factorization method ing data from other domains, [23] considers collaborawith threeproperties:decomposition under differ- tive matrix factorization (CMF). Let R denote the rats ent contexts, none over-fitting and low time com- ing matrix from another domain. Suppose two domains plexity. share the same users. It aims to minimize the following 2. A new generalization bound for matrix factorizaobjectivefunction:tion which can handle multi-class ratings, and its formal analysis to guarantee the performance of the minU,V,W α k (R − U V T ) I k2f proposed method. +(1 − α) k (Rs − U W T ) I k2f +R(U, V, W ) 3. Evaluations on real datasets that demonstrate improved performance over state-the-artmethods:Yahoo! Music Recommendation, Netflix, Moive- where W is the latent matrices for items in source domain, R(U, V, W ) is the regularization term, and α is lens and Eachmovie. the parameter to adjust the relative effects of two domains. Some recently proposed cross-domain MF meth2 Background and Related Works ods [13, 1] have similar objectives, but employ different We introduce the preliminaries and backgrounds in regularization terms. Obviously, these methods transthis section. The notations used in the paper are fer knowledge globally; as for each rating in R, the same summarized in Table 1. regularization (Rs − U W T )2 is imposed. 745 36 Copyright © SIAM. Unauthorized reproduction of this article is prohibited.
3.Downloaded 06/12/18 to 218.249.50.169. Redistribution subject to SIAM license or copyright; seehttp://www.siam.org/journals/ojsa.phpThe methods proposed in [14, 9] incorporate rela- Algorithm 1 RPMF tionship information between users, in order to make 1:Input:RatingMatrix:R, Number of LatentFactors:d, Number ofTrees:'>Trees: