• Title/Summary/Keyword: negative sampling

Search Result 640, Processing Time 0.025 seconds

On Some Distributions Generated by Riff-Shuffle Sampling

  • Son M.S.;Hamdy H.I.
    • International Journal of Contents
    • /
    • v.2 no.2
    • /
    • pp.17-24
    • /
    • 2006
  • The work presented in this paper is divided into two parts. The first part presents finite urn problems which generate truncated negative binomial random variables. Some combinatorial identities that arose from the negative binomial sampling and truncated negative binomial sampling are established. These identities are constructed and serve important roles when we deal with these distributions and their characteristics. Other important results including cumulants and moments of the distributions are given in somewhat simple forms. Second, the distributions of the maximum of two chi-square variables and the distributions of the maximum correlated F-variables are then derived within the negative binomial sampling scheme. Although multinomial theory applied to order statistics and standard transformation techniques can be used to derive these distributions, the negative binomial sampling approach provides more information and deeper insight regarding the nature of the relationship between the sampling vehicle and the probability distributions of these functions of chi-square variables. We also provide an algorithm to compute the percentage points of these distributions. We supplement our findings with exact simple computational methods where no interpolations are involved.

  • PDF

Improving passage retrieval via negative sampling from semantic feature space (의미론적 feature 공간상에서의 negative sampling을 통한 검색 성능 개선)

  • Jeong-Doo Lee;Beomseok Hong;Wonseok Choi;Youngsub Han;Byoung-Ki Jeon;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.146-149
    • /
    • 2022
  • 최근 검색 태스크에서는 좋은 negative sample을 얻는 방법론들이 적용되어 큰 성능 향상을 이뤘다. 하지만 좋은 negative sample 대부분의 방법들은 큰 계산 비용이 든다. 따라서 본 논문에서는 계산 비용이 적고 효과적인 negative sample을 얻기 위해 Mixed Gaussian Recurrent Chain (MGRC) sampling을 사용하여 feature 공간상에서 의미론적으로 유사한 feature를 얻고 이를 negative sample로 활용하여 기존 baseline 모델보다 좋은 성능을 얻었다.

  • PDF

The Role of Negative Binomial Sampling In Determining the Distribution of Minimum Chi-Square

  • Hamdy H.I.;Bentil Daniel E.;Son M.S.
    • International Journal of Contents
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2007
  • The distributions of the minimum correlated F-variable arises in many applied statistical problems including simultaneous analysis of variance (SANOVA), equality of variance, selection and ranking populations, and reliability analysis. In this paper, negative binomial sampling technique is employed to derive the distributions of the minimum of chi-square variables and hence the distributions of the minimum correlated F-variables. The work presented in this paper is divided in two parts. The first part is devoted to develop some combinatorial identities arised from the negative binomial sampling. These identities are constructed and justified to serve important purpose, when we deal with these distributions or their characteristics. Other important results including cumulants and moments of these distributions are also given in somewhat simple forms. Second, the distributions of minimum, chisquare variable and hence the distribution of the minimum correlated F-variables are then derived within the negative binomial sampling framework. Although, multinomial theory applied to order statistics and standard transformation techniques can be used to derive these distributions, the negative binomial sampling approach provides more information regarding the nature of the relationship between the sampling vehicle and the probability distributions of these functions of chi-square variables. We also provide an algorithm to compute the percentage points of the distributions. The computation methods we adopted are exact and no interpolations are involved.

Evaluation of Sample Testing Scheme for Designated Aquatic Animals (수산동물 지정검역물에 대한 표본검사 계획 검토)

  • Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.29 no.1
    • /
    • pp.58-62
    • /
    • 2012
  • To protect aquatic animal health of importing countries from the potential risks associated with exotic diseases introduced through international trade of live aquatic animals, inspection of designated commodities at ports of entry is a critical component of the safeguarding system. The only way to be 100% confident that no fishes in a shipment are infected with a specific agent is to test every fish in the commodity imported with a perfect diagnostic test. For the majority of cases, this is unrealistic since the group of interest may very large particularly for aquatic animals, or imperfect tests are often available. It is, therefore, more common to test a fixed proportion of a group by preplanned sampling schemes. However, decision making based on results of testing the sample can provide quite a chance that infected groups may be misclassified as uninfected, depending on sampling strategy employed. The objective of this study was to determine the possibility that one or more fishes in the group imported being infected but tests negative after inspecting samples. This question is critical to government authorities to examine whether sampling plan is sufficient to achieve the purpose intended for. At fixed population size, the maximum number of infected fishes when all tests negative was decreased as the sampling fraction increased. The probability of including at least one undetected but infected fish in a group for negative tests increased with the number of fish tested or true prevalence. The risk was much lesser where high sensitivity test was assumed; when increasing test sensitivity from 0.9 to 0.99, this risk was dramatically reduced to about a tenth or a fourth for prevalence ranges from 2 to 10%, given sample size ranges from 10 to 200. Based on the preliminary analysis, the author concluded that current sampling plan testing 4-8% of the import proposal for human consumption still can yield high false negative results. Therefore, from the quarantine inspection point of view, an enforced commodity-specific sampling design that accounts for the cost of testing with an imperfect test at the specified design prevalence is urgent.

Non-negative Unbiased MSE Estimation under Stratified Multi-stage Sampling

  • Kim, Kyuseong
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.4
    • /
    • pp.637-644
    • /
    • 2001
  • We investigated two kinds of mean square error (MSE) estimator of homogeneous linear estimator (HLE) for the population total under stratified multi-stage sampling. One is studied when the second stage variance component is estimable and the other is found in cafe it is not estimable. The proposed estimators are necessary forms of non-negative unbiased MSE estimators of HLE.

  • PDF

An Optimal Scheme of Inclusion Probability Proportional to Size Sampling

  • Kim Sun Woong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.181-189
    • /
    • 2005
  • This paper suggest a method of inclusion probability proportional to size sampling that provides a non-negative and stable variance estimator. The sampling procedure is quite simple and flexible since a sampling design is easily obtained using mathematical programming. This scheme appears to be preferable to Nigam, Kumar and Gupta's (1984) method which uses a balanced incomplete block designs. A comparison is made with their method through an example in the literature.

Interval Estimation of Population Proportion in a Double Sampling Scheme (이중표본에서 모비율의 구간추정)

  • Lee, Seung-Chun;Choi, Byong-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1289-1300
    • /
    • 2009
  • The double sampling scheme is effective in reducing the sampling cost. However, the doubly sampled data is contaminated by two types of error, namely false-positive and false-negative errors. These would make the statistical analysis more difficult, and it would require more sophisticate analysis tools. For instance, the Wald method for the interval estimation of a proportion would not work well. In fact, it is well known that the Wald confidence interval behaves very poorly in many sampling schemes. In this note, the property of the Wald interval is investigated in terms of the coverage probability and the expected width. An alternative confidence interval based on the Agresti-Coull's approach is recommended.

A Probabilistic Sampling Method for Efficient Flow-based Analysis

  • Jadidi, Zahra;Muthukkumarasamy, Vallipuram;Sithirasenan, Elankayer;Singh, Kalvinder
    • Journal of Communications and Networks
    • /
    • v.18 no.5
    • /
    • pp.818-825
    • /
    • 2016
  • Network management and anomaly detection are challenges in high-speed networks due to the high volume of packets that has to be analysed. Flow-based analysis is a scalable method which reduces the high volume of network traffic by dividing it into flows. As sampling methods are extensively used in flow generators such as NetFlow, the impact of sampling on the performance of flow-based analysis needs to be investigated. Monitoring using sampled traffic is a well-studied research area, however, the impact of sampling on flow-based anomaly detection is a poorly researched area. This paper investigates flow sampling methods and shows that these methods have negative impact on flow-based anomaly detection. Therefore, we propose an efficient probabilistic flow sampling method that can preserve flow traffic distribution. The proposed sampling method takes into account two flow features: Destination IP address and octet. The destination IP addresses are sampled based on the number of received bytes. Our method provides efficient sampled traffic which has the required traffic features for both flow-based anomaly detection and monitoring. The proposed sampling method is evaluated using a number of generated flow-based datasets. The results show improvement in preserved malicious flows.

The Effect of Word-of-Mouth on Purchase Intention: A Case Study of Low-Cost Carriers in Indonesia

  • SOELASIH, Yasintha;SUMANI, Sumani
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.4
    • /
    • pp.433-440
    • /
    • 2021
  • This study aims at testing word-of-mouth (WOM) by mediating positive and negative perceptions of purchase intention on low-cost carriers (LCC) flights in Indonesia. One of the communications mixes that airlines can carry out is WOM. WOM is a form of communication between passengers after using a flight. The formation of a positive perception of WOM is expected by airlines. If a positive perception of WOM has formed, a purchase intention will arise. The study population included LCC flight passengers in Indonesia, involving 387 respondents. For indicators and variables, validity and reliability tests were conducted using CFA, CR, and AVE tools. Sampling locations were Soekarno-Hatta and Kualanamu airports. Sample collection was obtained through purposive sampling, and the analytical tool used was structural equation modeling (SEM) with Lisrel. The results showed that WOM influenced purchase intention through positive and negative perceptions of WOM. It can be seen that a positive perception of WOM has a direct effect, while a negative perception of WOM has the opposite effect. In conclusion, the mediation of perceptions influences purchase intention, whether it in the same direction or the opposite ones. To conclude, WOM is an antecedent for it influences purchase intention.

Self-Sampling Versus Physicians' Sampling for Cervical Cancer Screening - Agreement of Cytological Diagnoses

  • Othman, Nor Hayati;Zaki, Fatma Hariati Mohamad;Hussain, Nik Hazlina Nik;Yusoff, Wan Zahanim Wan;Ismail, Pazuddin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.7
    • /
    • pp.3489-3494
    • /
    • 2016
  • Background: A major problem with cervical cancer screening in countries which have no organized national screening program for cervical cancer is sub-optimal participation. Implementation of self-sampling method may increase the coverage. Objective: We determined the agreement of cytological diagnoses made on samples collected by women themselves (self-sampling) versus samples collected by physicians (Physician sampling). Materials and Methods: We invited women volunteers to undergo two procedures; cervical self-sampling using the Evalyn brush and physician sampling using a Cervex brush. The women were shown a video presentation on how to take their own cervical samples before the procedure. The samples taken by physicians were taken as per routine testing (Gold Standard). All samples were subjected to Thin Prep monolayer smears. The diagnoses made were according to the Bethesda classification. The results from these two sampling methods were analysed and compared. Results: A total of 367 women were recruited into the study, ranging from 22 to 65 years age. There was a significant good agreement of the cytological diagnoses made on the samples from the two sampling methods with the Kappa value of 0.568 (p=0.040). Using the cytological smears taken by physicians as the gold standard, the sensitivity of self-sampling was 71.9% (95% CI:70.9-72.8), the specificity was 86.6% (95% CI:85.7-87.5), the positive predictive value was 74.2% (95% CI:73.3-75.1) and the negative predictive value was 85.1% (95% CI: 84.2-86.0). Self-sampling smears (22.9%) allowed detection of micro-organisms better than physicians samples (18.5%). Conclusions: This study shows that samples taken by women themselves (self-sampling) and physicians have good diagnostic agreement. Self-sampling could be the method of choice in countries in which the coverage of women attending clinics for screening for cervical cancer is poor.