• Title/Summary/Keyword: data field selection

Search Result 404, Processing Time 0.052 seconds

Feature Selection Methodology in Quality Data Mining

  • Soo, Nam-Ho;Halim, Yulius
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.05a
    • /
    • pp.698-701
    • /
    • 2004
  • In many literatures, data mining has been used as a utilization of data warehouse and data collection. The biggest utilizations of data mining are for marketing and researches. This is solely because of the data available for this field is usually in large amount. The usability of the data mining is expandable also to the production process. While the object of research of the data mining in marketing is the customers and products, data mining in the production field is object to the so called 4MlE, man, machine, materials, method (recipe) and environment. All of the elements are important to the production process which determines the quality of the product. Because the final aim of the data mining in production field is the quality of the production, this data mining is commonly recognized as quality data mining. As the variables researched in quality data mining can be hundreds or more, it could take a long time to reveal the information from the data warehouse. Feature selection methodology is proposed to help the research take the best performance in a relatively short time. The usage of available simple statistical tools in this method can help the speed of the mining.

  • PDF

Improving an Ensemble Model Using Instance Selection Method (사례 선택 기법을 활용한 앙상블 모형의 성능 개선)

  • Min, Sung-Hwan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

A Comparative Analysis of Status on the Selection and Evaluation of Field Manager in Apartment and Office Building Project (공동주택과 오피스 건설현장의 조직원선정 및 평가실태 비교분석)

  • Lee Jung-Yong;Kim Byeong-Lae;Son Chang-Baek
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • autumn
    • /
    • pp.290-293
    • /
    • 2003
  • The field organization management in construction industry is very important factor improvement and cost reduction. But, until now, evaluation methods on the existing field organization did not make ready and endeavor of organization betterment was insufficient in construction industry. The purpose of this study to provide basic data for reasonable selection and evaluation of the field manager through analyzing operation status of the existing field organization in apartment and office building project by interview and questionnaire on the 22 construction companies. This study presented optimal proposal on the selection and evalution of field manager for productivity improvement and cost reduction by means of efficient construction progress in apartment and dffice building project.

  • PDF

A Study on Selection of Cross-Docking Center based on Existing Logistics Network (기존 물류 네트워크 기반에서 크로스 - 도킹 거점선정에 관한 연구)

  • Lee, In-Chul;Lee, Myeong-Ho;Kim, Nae-Heon
    • IE interfaces
    • /
    • v.19 no.1
    • /
    • pp.26-33
    • /
    • 2006
  • Many Firms consider the application of a cross-docking system to reduce inventory and lead-time. However, most studies mainly concentrate on the design of a cross-docking system. This study presents the method that selects the cross-docking center under the existing logistics network. Describing the operation environment to apply the cross-docking system, the selection criteria of the cross-docking center, and the main constraints of transportation planning under the environment of multi-level logistics network, we define the selection problem of the cross-docking center applied to a logistics field. We also define the simulation model that can analyze variously the cross-docking volume and develop the selection methodology of the cross-docking center. The simulation model presents the algorithm and influence factors of the cross-docking system, the decision criteria of the system, policy parameter, and input data. In addition, this study analyzes the effect of increasing the number of simultaneous receiving and shipping docks, and the efficiency of the overnight transportation and cross-docking by evaluating each scenario after simulating the scenarios with the practical data of the logistics field.

Improving Field Crop Classification Accuracy Using GLCM and SVM with UAV-Acquired Images

  • Seung-Hwan Go;Jong-Hwa Park
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.1
    • /
    • pp.93-101
    • /
    • 2024
  • Accurate field crop classification is essential for various agricultural applications, yet existing methods face challenges due to diverse crop types and complex field conditions. This study aimed to address these issues by combining support vector machine (SVM) models with multi-seasonal unmanned aerial vehicle (UAV) images, texture information extracted from Gray Level Co-occurrence Matrix (GLCM), and RGB spectral data. Twelve high-resolution UAV image captures spanned March-October 2021, while field surveys on three dates provided ground truth data. We focused on data from August (-A), September (-S), and October (-O) images and trained four support vector classifier (SVC) models (SVC-A, SVC-S, SVC-O, SVC-AS) using visual bands and eight GLCM features. Farm maps provided by the Ministry of Agriculture, Food and Rural Affairs proved efficient for open-field crop identification and served as a reference for accuracy comparison. Our analysis showcased the significant impact of hyperparameter tuning (C and gamma) on SVM model performance, requiring careful optimization for each scenario. Importantly, we identified models exhibiting distinct high-accuracy zones, with SVC-O trained on October data achieving the highest overall and individual crop classification accuracy. This success likely stems from its ability to capture distinct texture information from mature crops.Incorporating GLCM features proved highly effective for all models,significantly boosting classification accuracy.Among these features, homogeneity, entropy, and correlation consistently demonstrated the most impactful contribution. However, balancing accuracy with computational efficiency and feature selection remains crucial for practical application. Performance analysis revealed that SVC-O achieved exceptional results in overall and individual crop classification, while soybeans and rice were consistently classified well by all models. Challenges were encountered with cabbage due to its early growth stage and low field cover density. The study demonstrates the potential of utilizing farm maps and GLCM features in conjunction with SVM models for accurate field crop classification. Careful parameter tuning and model selection based on specific scenarios are key for optimizing performance in real-world applications.

Habitat selection in the lesser cuckoo, an avian brood parasite breeding on Jeju Island, Korea

  • Yun, Seongho;Lee, Jin-Won;Yoo, Jeong-Chil
    • Journal of Ecology and Environment
    • /
    • v.44 no.2
    • /
    • pp.106-114
    • /
    • 2020
  • Background: Determining patterns of habitat use is key to understanding of animal ecology. Approximately 1% of bird species use brood parasitism for their breeding strategy, in which they exploit other species' (hosts) parental care by laying eggs in their nests. Brood parasitism may complicate the habitat requirement of brood parasites because they need habitats that support both their host and their own conditions for breeding. Brood parasitism, through changes in reproductive roles of sex or individual, may further diversify habitat use patterns among individuals. However, patterns of habitat use in avian brood parasites have rarely been characterized. In this study, we categorized the habitat preference of a population of brood parasitic lesser cuckoos (Cuculus poliocephalus) breeding on Jeju Island, Korea. By using compositional analyses together with radio-tracking and land cover data, we determined patterns of habitat use and their sexual and diurnal differences. Results: We found that the lesser cuckoo had a relatively large home range and its overall habitat composition (the second-order selection) was similar to those of the study area; open areas such as the field and grassland habitats accounted for 80% of the home range. Nonetheless, their habitat, comprised of 2.54 different habitats per hectare, could be characterized as a mosaic. We also found sexual differences in habitat composition and selection in the core-use area of home ranges (third-order selection). In particular, the forest habitat was preferentially utilized by females, while underutilized by males. However, there was no diurnal change in the pattern of habitat use. Both sexes preferred field habitats at the second-order selection. At the third-order selection, males preferred field habitats followed by grasslands and females preferred grasslands followed by forest habitats. Conclusions: We suggest that the field and grassland habitats represent the two most important areas for the lesser cuckoo on Jeju Island. Nevertheless, this study shows that habitat preference may differ between sexes, likely due to differences in sex roles, sex-based energy demands, and potential sexual conflict.

Back Analysis of Tunnel for multi-step Construction (시공 단계를 고려한 터널의 역해석에 관한 연구)

  • 김선명;윤지선
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2000.11a
    • /
    • pp.479-484
    • /
    • 2000
  • The reliable estimation of the system parameters and the accurate prediction of the system behavior are important to design tunnel safely and economically. Therefore, the back analysis using the field measurements data is useful to evaluate the geotechnical parameter for tunnel. In the back analysis method, the selection of initial value and uncertainty of field measurements influence significantly on the analysis result. In this paper, to overcome uncertainty of field measurements, we performed the back analysis using the displacement data gained at each step of excavation and support.

  • PDF

Efficient crosswell EM Tomography using localized nonlinear approximation

  • Kim Hee Joon;Song Yoonho;Lee Ki Ha;Wilt Michael J.
    • Geophysics and Geophysical Exploration
    • /
    • v.7 no.1
    • /
    • pp.51-55
    • /
    • 2004
  • This paper presents a fast and stable imaging scheme using the localized nonlinear (LN) approximation of integral equation (IE) solutions for inverting electromagnetic data obtained in a crosswell survey. The medium is assumed to be cylindrically symmetric about a source borehole, and to maintain the symmetry a vertical magnetic dipole is used as a source. To find an optimum balance between data fitting and smoothness constraint, we introduce an automatic selection scheme for a Lagrange multiplier, which is sought at each iteration with a least misfit criterion. In this selection scheme, the IE algorithm is quite attractive for saving computing time because Green's functions, whose calculation is a most time-consuming part in IE methods, are repeatedly re-usable throughout the inversion process. The inversion scheme using the LN approximation has been tested to show its stability and efficiency, using both synthetic and field data. The inverted image derived from the field data, collected in a pilot experiment of water-flood monitoring in an oil field, is successfully compared with that derived by a 2.5-dimensional inversion scheme.

Cancer-Subtype Classification Based on Gene Expression Data (유전자 발현 데이터를 이용한 암의 유형 분류 기법)

  • Cho Ji-Hoon;Lee Dongkwon;Lee Min-Young;Lee In-Beum
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.10 no.12
    • /
    • pp.1172-1180
    • /
    • 2004
  • Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.

Integrated AHP and DEA method for technology evaluation and selection: application to clean technology (기술 평가 및 선정을 위한 AHP와 DEA 통합 활용 방법: 청정기술에의 적용)

  • Yu, Peng;Lee, Jang Hee
    • Knowledge Management Research
    • /
    • v.13 no.3
    • /
    • pp.55-77
    • /
    • 2012
  • Selecting promising technology is becoming more and more difficult due to the increased number and complexity. In this study, we propose hybrid AHP/DEA-AR method and hybrid AHP/DEA-AR-G method to evaluate efficiency of technology alternatives based on ordinal rating data collected through survey to technology experts in a certain field and select efficient technology alternative as promising technology. The proposed method normalizes rating data and uses AHP to derive weights to improve the credibility of analysis, then in order to avoid basic DEA models' problems, use DEA-AR and DEA-AR-G to evaluate efficiency of technology alternatives. In this study, we applied the proposed methods to clean technology and compared with the basic DEA models. According to the result of the comparison, we can find that the both proposed methods are excellent in confirming most efficient technology, and hybrid AHP/DEA-AR method is much easier to use in the process of technology selection.

  • PDF