DOI QR코드

DOI QR Code

A Fuzzy Window Mechanism for Information Differentiation in Mining Data Streams

데이터 스트림 마이닝에서 정보 중요성 차별화를 위한 퍼지 윈도우 기법

  • 장중혁 (대구대학교 컴퓨터IT공학부)
  • Received : 2011.08.08
  • Accepted : 2011.09.08
  • Published : 2011.09.30

Abstract

Considering the characteristics of a data stream whose data elements are continuously generated and may change over time, there have been many techniques to differentiate the importance of data elements in a data stream by their generation time. The conventional techniques are efficient to get an analysis result focusing on the recent information in a data stream, but they have a limitation to differentiate the importance of information in various ways more flexible. An information differentiation technique based on the term of a fuzzy set can be an alternative way to compensate the limitation. A term of a fuzzy set has been widely used in various data mining fields, which can overcome the sharp boundary problem and give an analysis result reflecting the requirements in real world applications more. In this paper, a fuzzy window mechanism is proposed, which is adapting a term of a fuzzy set and is efficiently used to differentiate the importance of information in mining data streams. Basic concepts including fuzzy calendars are described first, and subsequently details on data stream mining of weighted patterns using a fuzzy window technique are described.

구성요소가 지속적으로 생성되고 시간 흐름에 따라 변화되기도 하는 데이터 스트림의 특성을 고려하여 데이터 스트림 구성요소의 중요성을 발생 시간에 따라 차별화하기 위한 기법들이 활발히 제안되어 왔다. 기존의 방법들은 최근에 발생된 정보에 집중된 분석 결과를 제공하는데 효과적이나 보다 유연하게 다양한 형태로 정보 중요성을 차별화하는데 한계가 있다. 퍼지 개념에 기반한 정보 중요성 차별화는 이러한 한계를 보완하는 좋은 대안이 될 수 있다. 퍼지 개념은 기존의 뚜렷한 경계를 갖는 접근법의 문제점을 극복하고 실세계의 요구에 보다 부합되는 결과를 제공할 수 있는 방법으로 여러 데이터 마이닝 분야에서 널리 적용되어 왔다. 본 논문에서는 퍼지 개념을 적용하여 데이터 스트림 마이닝에서 정보 중요성 차별화에 효율적으로 활용될 수 있는 퍼지 윈도우 기법을 제안한다. 퍼지 캘린더를 포함한 기본적인 퍼지 개념에 대해서 먼저 기술하고, 다음으로 데이터 스트림 마이닝에서 퍼지 윈도우 기법을 적용한 가중치 패턴 탐색에 대한 세부 내용을 기술한다.

Keywords

References

  1. S.-C. Chiu, H.-F. Li, J.-L. Huang, and H.-H. You, " Incremental Mining of Closed Inter-Transaction Itemsets over Data Stream Sliding Windows," Journal of Information Science, Vol. 37, No. 2, pp. 208-220, 2011. https://doi.org/10.1177/0165551511401539
  2. H.-F. Li and S.-Y. Lee, "Mining Frequent Itemsets over Data Streams using Efficient Window Sliding Techniques," Expert Systems with Applications, Vol. 36, No. 2, pp. 1466-1477, 2009. https://doi.org/10.1016/j.eswa.2007.11.061
  3. H.-F. Li, "Pattern Discovery and Change Detection of Online Music Query Streams," Multimedia Tools and Applications, Vol. 41, No. 2, pp. 287-304, 2009. https://doi.org/10.1007/s11042-008-0229-9
  4. L. Jia, Z. Wang, N. Lu, X. Xu, D. Zhou, and Y. Wang, "RFIMiner: A Regression-based Algorithm for Recently Frequent Patterns in Multiple Time Granularity Data Streams," Applied Mathematics and Computation, Vol. 185, No. 2, pp. 769-783, 2007. https://doi.org/10.1016/j.amc.2006.06.115
  5. J.H. Chang and W.S. Lee, "Finding Recently Frequent Itemsets Adaptively over Online Transactional Data Streams," Information Systems, Vol. 31, No. 8, pp. 849-869, 2006. https://doi.org/10.1016/j.is.2005.04.001
  6. Y.-H. Kim, W.-Y. Kim, and U.-M. Kim, "An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams," Journal of the Korea Academia-Industrial Cooperation Society, Vol. 10, No. 8, pp. 1998-2004. https://doi.org/10.5762/KAIS.2009.10.8.1998
  7. W.-J. Lee and S.-J. Lee, "Discovery of Fuzzy Temporal Association Rules," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 34, No. 6, pp. 2330-2342, 2004. https://doi.org/10.1109/TSMCB.2004.835352
  8. Y.-L. Chen, M.-C. Chiang, and M.-T. Ko, "Discovering Fuzzy Time-Interval Sequential Patterns in Sequence Databases," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, Vol. 35, No. 5, pp. 959-972, 2005. https://doi.org/10.1109/TSMCB.2005.847741
  9. T. C.-K. Huang, "Developing an Efficient Knowledge Discovering Model for Mining Fuzzy Multi-level Sequential Patterns in Sequence Databases," Fuzzy Sets and Systems, Vol. 160, No. 23, pp. 3359-3381, 2009. https://doi.org/10.1016/j.fss.2009.06.003
  10. D.L. Olson and Y. Li, "Mining Fuzzy Weighted Association Rules," Proc. of the 40th Hawaii International Conference on System Sciences, pp. 53-61, 2007.
  11. C.-J. Li and T.-Q. Yang, "Effective Mining of Fuzzy Quantitative Weighted Association Rules," Proc. of the Int'l Conf. on E-Business and E-Government, pp. 1418-1421, 2010.
  12. Y.-M. Wang and T. M.S. Elhag, "On the Normalization of Interval and Fuzzy Weights," Fuzzy Sets and Systems, Vol. 157, No. 18, pp. 2456-2471, 2006. https://doi.org/10.1016/j.fss.2006.06.008
  13. S. Ramaswamy, S. Mahajan, and A. Silberschatz, "On the Discovery of Interesting Patterns in Association Rules," Proc. of the Int'l Conf. on Very Large Database, pp. 368-379, 1998.
  14. J.H. Chang and W.S. Lee, "Efficient Mining Method for Retrieving Sequential Patterns over Online Data Streams," Journal of Information Science, Vol. 31, No. 5, pp. 420-432, 2005. https://doi.org/10.1177/0165551505055405
  15. R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. of the Int'l Conf. on Data Engineering, pp. 3-14, 1995.