This paper investigates a relationship between the fuzziness of a classifier and the misclassification rate of the classifier on a group of samples. For a given trained classifier that outputs a membership vector, we demonstrate experimentally that samples with higher fuzziness outputted by the classifier mean a bigger risk of misclassification. We then propose a fuzziness category based divide-and-conquer strategy which separates the high-fuzziness samples from the low fuzziness samples. A particular technique is used to handle the high-fuzziness samples for promoting the classifier performance. The reasonability of the approach is theoretically explained and its effectiveness is experimentally demonstrated.
Countering cyber threats, especially attack detection, is a challenging area of research in the field of information assurance. Intruders use polymorphic mechanisms to masquerade the attack payload and evade the detection techniques. Many supervised and unsupervised learning approaches from the field of machine learning and pattern recognition have been used to increase the efficacy of intrusion detection systems (IDSs). Supervised learning approaches use only labeled samples to train a classifier, but obtaining sufficient labeled samples is cumbersome, and requires the efforts of domain experts. However, unlabeled samples can easily be obtained in many real world problems. Compared to supervised learning approaches, semi-supervised learning (SSL) addresses this issue by considering large amount of unlabeled samples together with the labeled samples to build a better classifier. This paper proposes a novel fuzziness based semi-supervised learning approach by utilizing unlabeled samples assisted with supervised learning algorithm to improve the classifier’s performance for the IDSs. A single hidden layer feed-forward neural network (SLFN) is trained to output a fuzzy membership vector, and the sample categorization (low, mid, and high fuzziness categories) on unlabeled samples is performed using the fuzzy quantity. The classifier is retrained after incorporating each category separately into the original training set. The experimental results using this technique of intrusion detection on the NSL-KDD dataset show that unlabeled samples belonging to low and high fuzziness groups make major contributions to improve the classifier’s performance compared to existing classifiers e.g., naive bayes, support vector machine, random forests, etc.
We investigate essential relationships between generalization capabilities and fuzziness of fuzzy classifiers (viz., the classifiers whose outputs are vectors of membership grades of a pattern to the individual classes). The study makes a claim and offers sound evidence behind the observation that higher fuzziness of a fuzzy classifier may imply better generalization aspects of the classifier, especially for classification data exhibiting complex boundaries. This observation is not intuitive with a commonly accepted position in "traditional" pattern recognition. The relationship that obeys the conditional maximum entropy principle is experimentally confirmed. Furthermore, the relationship can be explained by the fact that samples located close to classification boundaries are more difficult to be correctly classified than the samples positioned far from the boundaries. This relationship is expected to provide some guidelines as to the improvement of generalization aspects of fuzzy classifiers.
The first aim is to emphasize the use of fuzziness in data analysis to capture information that has been traditionally disregarded with a cost in the precision of the conclusions. Fuzziness can be considered in the data analysis process at various stages, but the main target in this paper will be fuzziness in the data. Depending on the nature of the fuzzy data or the aim to which they are handled, different approaches should be applied. We attempt to contribute to the clarification of such a difference while focusing on the so-called ontic approach in contrast to the epistemic approach. The second aim is to underline the need of considering robust methods to reduce the misleading impact of outliers in fuzzy data analysis. We propose trimming as a general and intuitive method to discard outliers. We exemplify this approach with the case of the ontic fuzzy trimmed mean/variance and highlight the differences with the epistemic case. All the discussions and developments are illustrated by means of a case-study concerning the perception of lengths of men and women.
Scientists want to comprehend and control complex systems. Their success depends on the ability to face also the challenges of the corresponding computational complexity. A promising research line is artificial intelligence (AI). In AI, fuzzy logic plays a significant role because it is a suitable model of the human capability to compute with words, which is relevant when we make decisions in complex situations. The concept of fuzzy set pervades the natural information systems (NISs), such as living cells, the immune and the nervous systems. This paper describes the fuzziness of the NISs, in particular of the human nervous system. Moreover, it traces three pathways to process fuzzy logic by molecules and their assemblies. The fuzziness of the molecular world is useful for the development of the chemical artificial intelligence (CAI). CAI will help to face the challenges that regard both the natural and the computational complexity.
The qualities of new data used in the sequential learning phase of the online sequential extreme learning machine algorithm (OS-ELM) have a significant impact on the performance of OS-ELM. This paper proposes a novel data filter mechanism for OS-ELM from the perspective of fuzziness and a fuzziness-based online sequential extreme learning machine algorithm (FOS-ELM). In FOS-ELM, when new data arrive, a fuzzy classifier first picks out the meaningful data according to the fuzziness of each sample. Specifically, the new samples with high-output fuzziness are selected and then used in sequential learning. The experimental results on eight binary classification problems and three multiclass classification problems have shown that FOS-ELM updated by the new samples with high-output fuzziness has better generalization performance than OS-ELM. Since the unimportant data are discarded before sequential learning, FOS-ELM can save more memory and have higher computational efficiency. In addition, FOS-ELM can handle data one-by-one or chunk-by-chunk with fixed or varying sizes. The relationship between the fuzziness of new samples and the model performance is also studied in this paper, which is expected to provide some useful guidelines for improving the generalization ability of online sequential learning algorithms.
Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.
The present study aims to clarify the necessity and effectiveness of considering fuzziness in modelling fish habitat preference, and the advantages which would be achieved by considering it. For this purpose, genetic algorithm (GA) optimized habitat preference models under three different levels of fuzzification were compared with regard to prediction ability of the habitat use of Japanese medaka ( ) dwelling in agricultural canals in Japan. Field surveys were conducted in agricultural canals in Japan to establish a relationship between fish habitat preference and physical environments of water depth, current velocity, lateral cover ratio and percent vegetation coverage. The habitat preference models employed for testing the fuzzy-based approach were category model, fuzzy habitat preference model, and fuzzy habitat preference model with fuzzy inputs. All the models were developed at 50 different initial conditions. The effectiveness of the fuzzification in fish habitat modelling was assessed by comparing mean square error and standard deviation of the models, and fluctuation in habitat preference curves evaluated by each model. As a result, the effect of fuzzification appeared as smoother curves and was found to reduce fluctuation in habitat preference curves in proportion to the level of fuzzification. The smooth curves would be appropriate for expressing uncertainty in habitat preference of the fish, by which fuzzy habitat preference model with fuzzy input achieve the best prediction ability among the models. In conclusion, the present study revealed that there are two advantages of fuzzification: reducing fluctuations in habitat preference evaluation and improving prediction ability of the model. Therefore, the consideration of fuzziness would be appropriate for representing fish habitat preference under natural conditions.
We introduce a new encryption notion called distance-based encryption (DBE) to apply biometrics in identity-based encryption. In this notion, a ciphertext encrypted with a vector and a threshold value can be decrypted with a private key of another vector, if and only if the distance between these two vectors is less than or equal to the threshold value. The adopted distance measurement is called Mahalanobis distance, which is a generalization of Euclidean distance. This novel distance is a useful recognition approach in the pattern recognition and image processing community. The primary application of this new encryption notion is to incorporate biometric identities, such as face, as the public identity in an identity-based encryption. In such an application, usually the input biometric identity associated with a private key will not be exactly the same as the input biometric identity in the encryption phase, even though they are from the same user. The introduced DBE addresses this problem well as the decryption condition does not require identities to be identical but having small distance. The closest encryption notion to DBE is the fuzzy identity-based encryption, but it measures biometric identities using a different distance called an overlap distance (a variant of Hamming distance) that is not widely accepted by the pattern recognition community, due to its long binary representations. In this paper, we study this new encryption notion and its constructions. We show how to generically and efficiently construct such a DBE from an inner product encryption (IPE) with reasonable size of private keys and ciphertexts. We also propose a new IPE scheme with the shortest private key to build DBE, namely, the need for a short private key. Finally, we study the encryption efficiency of DBE by splitting our IPE encryption algorithm into offline and online algorithms.
The railway freight transportation planning problem under the mixed uncertain environment of fuzziness and randomness is investigated in this paper, in which the optimal paths, the amount of commodities passing through each path and the frequency of services need to be determined. Based on the chance measure and critical values of the random fuzzy variable, three chance-constrained programming models are constructed for the problem with respect to different criteria. Some equivalents of objectives and constraints are also discussed in order to investigate mathematical properties of the models. To solve the models, a potential path searching algorithm, simulation algorithms and a genetic algorithm are integrated as a hybrid algorithm to solve an optimal solution. Finally, some numerical examples are performed to show the applications of the models and the algorithm.