TSCluWin
2020-03-01 136浏览
- 1.DASFAA 2016TSCluWin:Trajectory Stream Clustering over Sliding Window Jiali Mao, Qiuge Song, Cheqing Jin, Zhigang Zhang, Aoying Zhou April 17, 2016
- 2.1 Background 2 Synopsis DataStructures:TF & EF 3 Trajectory Stream ClusteringAlgorithm:TSCluWin 4 Experiment Evaluation 2
- 3.1 Background 3
- 4.Positioning devicies Real-time analysis, e.g., Clustering Extracting the similar movement behavior Live-traffic reports Streaming trajectory data 4
- 5. Huge amounts of trajectories are updated rapidly 5
- 6. Trajectory incremental DASFAA 2010) clustering (e.g.,TCMM, Expired records cannot be discarded upon the arrival of new ones, which leads to concept drift, and thus degrade the quality of clustering result Data stream clustering over the sliding-window model (e.g., SWClustering, Knowl. Inf. Syst. 2008) Each tuple must be a “full” entry, while each tuple in the trajectory data stream is only a “part” of an entry. 6
- 7. Unbounded data volume vs. limited memory space Trajectory evolves over time Influence of obsolete data Sliding window model Synopsis data structure
- 8.2 Synopsis DataStructures:TF & EF 8
- 9.9
- 10.10
- 11.11
- 12.12
- 13.3 Trajectory Stream ClusteringAlgorithm:TSCluWin 13
- 14.Experiment Result Visualization Slide Window Processing Macro-cluster Creation Micro-cluster Maintenance Cluster for Micro-clusters in userspecified time horizon New Trajectories Absorbing Obsolete Data Eliminating Trajectory Line Segments Stream 14 Nearest Microclusters Merging Representative line segment generating
- 15.Receive new data Lx Find the most similar EF h for Lx ? Most similar EF with the followingconstraints:1. 2. N Y N If Y Eliminate Expired Records and Merge EFs 15
- 16. Eraser Criterion Remove the expired EFs Filter out EFs with the earlier updated time and fewer participating trajectory line segments Merging Conditions Spatial proximity (distance threshold ) Temporal closeness ( time tolerance threshold )
- 17.4 Experiment Evaluation 17
- 18. A real world dataset about the trajectories of the taxis in Shanghai, China. This dataset contains the GPS logs of about 30,000 taxis during three months (October, November, December) in 2013, covering 93% main road network of Shanghai. 18
- 19.33 microclusters 3 macroclusters Nanjing Pudong East Road Airport Bund Hongqiao Airport (from 10:30 to 11:00 a.m) (from 10:40 to 11:00 a.m.) 19 (from 10:30 to 11:00 a.m)
- 20. SSQ(the sum of square distance) SSQ= MinLns =4 d=5000
- 21.21
- 22.22
- 23. We propose a two-step method to cluster evolving trajectory streams over the sliding-window model, called TSCluWin We define two synopsis data structures (EF and TF) to maintain the most recent cluster changes of trajectory stream in memory We conduct a comprehensive series of experiments on a real dataset to manifest the efficiency and the effectiveness of our proposal, as well as the superiority to other congeneric approach 23
- 24.Thank You!