TSCluWin

2020-03-01 136浏览

  • 1.DASFAA 2016TSCluWin:Trajectory Stream Clustering over Sliding Window Jiali Mao, Qiuge Song, Cheqing Jin, Zhigang Zhang, Aoying Zhou April 17, 2016
  • 2.1 Background 2 Synopsis DataStructures:TF & EF 3 Trajectory Stream ClusteringAlgorithm:TSCluWin 4 Experiment Evaluation 2
  • 3.1 Background 3
  • 4.Positioning devicies Real-time analysis, e.g., Clustering Extracting the similar movement behavior Live-traffic reports Streaming trajectory data 4
  • 5. Huge amounts of trajectories are updated rapidly 5
  • 6. Trajectory incremental DASFAA 2010) clustering (e.g.,TCMM, Expired records cannot be discarded upon the arrival of new ones, which leads to concept drift, and thus degrade the quality of clustering result  Data stream clustering over the sliding-window model (e.g., SWClustering, Knowl. Inf. Syst. 2008) Each tuple must be a “full” entry, while each tuple in the trajectory data stream is only a “part” of an entry. 6
  • 7. Unbounded data volume vs. limited memory space  Trajectory evolves over time  Influence of obsolete data Sliding window model Synopsis data structure
  • 8.2 Synopsis DataStructures:TF & EF 8
  • 9.9
  • 10.10
  • 11.11
  • 12.12
  • 13.3 Trajectory Stream ClusteringAlgorithm:TSCluWin 13
  • 14.Experiment Result Visualization Slide Window Processing Macro-cluster Creation Micro-cluster Maintenance Cluster for Micro-clusters in userspecified time horizon New Trajectories Absorbing Obsolete Data Eliminating Trajectory Line Segments Stream 14 Nearest Microclusters Merging Representative line segment generating
  • 15.Receive new data Lx Find the most similar EF h for Lx ? Most similar EF with the followingconstraints:1. 2. N Y N If Y Eliminate Expired Records and Merge EFs 15
  • 16.  Eraser Criterion Remove the expired EFs  Filter out EFs with the earlier updated time and fewer participating trajectory line segments  Merging Conditions  Spatial proximity (distance threshold )  Temporal closeness ( time tolerance threshold )
  • 17.4 Experiment Evaluation 17
  • 18. A real world dataset about the trajectories of the taxis in Shanghai, China.  This dataset contains the GPS logs of about 30,000 taxis during three months (October, November, December) in 2013, covering 93% main road network of Shanghai. 18
  • 19.33 microclusters 3 macroclusters Nanjing Pudong East Road Airport Bund Hongqiao Airport (from 10:30 to 11:00 a.m) (from 10:40 to 11:00 a.m.) 19 (from 10:30 to 11:00 a.m)
  • 20. SSQ(the sum of square distance)  SSQ= MinLns =4 d=5000
  • 21.21
  • 22.22
  • 23. We propose a two-step method to cluster evolving trajectory streams over the sliding-window model, called TSCluWin  We define two synopsis data structures (EF and TF) to maintain the most recent cluster changes of trajectory stream in memory  We conduct a comprehensive series of experiments on a real dataset to manifest the efficiency and the effectiveness of our proposal, as well as the superiority to other congeneric approach 23
  • 24.Thank You!