Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.
Natural language processing (NLP) is a theory-motivated range of computational techniques for the automatic analysis and representation of human language. NLP research has evolved from the era of punch cards and batch processing (in which the analysis of a sentence could take up to 7 minutes) to the era of Google and the likes of it (in which millions of webpages can be processed in less than a second). This review paper draws on recent developments in NLP research to look at the past, present, and future of NLP technology in a new light. Borrowing the paradigm of `jumping curves' from the field of business management and marketing prediction, this survey article reinterprets the evolution of NLP research as the intersection of three overlapping curves-namely Syntactics, Semantics, and Pragmatics Curveswhich will eventually lead NLP research to evolve into natural language understanding.
The prevalence of mobile phones, the internet-of-things technology, and networks of sensors has led to an enormous and ever increasing amount of data that are now more commonly available in a streaming fashion -. Often, it is assumed - either implicitly or explicitly - that the process generating such a stream of data is stationary, that is, the data are drawn from a fixed, albeit unknown probability distribution. In many real-world scenarios, however, such an assumption is simply not true, and the underlying process generating the data stream is characterized by an intrinsic nonstationary (or evolving or drifting) phenomenon. The nonstationarity can be due, for example, to seasonality or periodicity effects, changes in the users' habits or preferences, hardware or software faults affecting a cyber-physical system, thermal drifts or aging effects in sensors. In such nonstationary environments, where the probabilistic properties of the data change over time, a non-adaptive model trained under the false stationarity assumption is bound to become obsolete in time, and perform sub-optimally at best, or fail catastrophically at worst.
Extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks (SLFNs), provides efficient unified learning solutions for the applications of feature learning, clustering, regression and classification. Different from the common understanding and tenet that hidden neurons of neural networks need to be iteratively adjusted during training stage, ELM theories show that hidden neurons are important but need not be iteratively tuned. In fact, all the parameters of hidden nodes can be independent of training samples and randomly generated according to any continuous probability distribution. And the obtained ELM networks satisfy universal approximation and classification capability. The fully connected ELM architecture has been extensively studied. However, ELM with local connections has not attracted much research attention yet. This paper studies the general architecture of locally connected ELM, showing that: 1) ELM theories are naturally valid for local connections, thus introducing local receptive fields to the input layer; 2) each hidden node in ELM can be a combination of several hidden nodes (a subnetwork), which is also consistent with ELM theories. ELM theories may shed a light on the research of different local receptive fields including true biological receptive fields of which the exact shapes and formula may be unknown to human beings. As a specific example of such general architectures, random convolutional nodes and a pooling structure are implemented in this paper. Experimental results on the NORB dataset, a benchmark for object recognition, show that compared with conventional deep learning solutions, the proposed local receptive fields based ELM (ELM-LRF) reduces the error rate from 6.5% to 2.7% and increases the learning speed up to 200 times.
This article provides an overview of the mainstream deep learning approaches and research directions proposed over the past decade. It is important to emphasize that each approach has strengths and "weaknesses, depending on the application and context in "which it is being used. Thus, this article presents a summary on the current state of the deep machine learning field and some perspective into how it may evolve. Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs) (and their respective variations) are focused on primarily because they are well established in the deep learning field and show great promise for future work.
Ensemble methods use multiple models to get better performance. Ensemble methods have been used in multiple research fields such as computational intelligence, statistics and machine learning. This paper reviews traditional as well as state-of-the-art ensemble methods and thus can serve as an extensive summary for practitioners and beginners. The ensemble methods are categorized into conventional ensemble methods such as bagging, boosting and random forest, decomposition methods, negative correlation learning methods, multi-objective optimization based ensemble methods, fuzzy ensemble methods, multiple kernel learning ensemble methods and deep learning based ensemble methods. Variations, improvements and typical applications are discussed. Finally this paper gives some recommendations for future research directions.
Over the last three decades, a large number of evolutionary algorithms have been developed for solving multi-objective optimization problems. However, there lacks an upto-date and comprehensive software platform for researchers to properly benchmark existing algorithms and for practitioners to apply selected algorithms to solve their real-world problems. The demand of such a common tool becomes even more urgent, when the source code of many proposed algorithms has not been made publicly available. To address these issues, we have developed a MATLAB platform for evolutionary multi-objective optimization in this paper, called PlatEMO, which includes more than 50 multiobjective evolutionary algorithms and more than 100 multi-objective test problems, along with several widely used performance indicators. With a user-friendly graphical user interface, PlatEMO enables users to easily compare several evolutionary algorithms at one time and collect statistical results in Excel or LaTeX files. More importantly, PlatEMO is completely open source, such that users are able to develop new algorithms on the basis of it. This paper introduces the main features of PlatEMO and illustrates how to use it for performing comparative experiments, embedding new algorithms, creating new test problems, and developing performance indicators. Source code of PlatEMO is now available at: http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html.
In this article, we introduce some recent research trends within the field of adaptive/approximate dynamic programming (ADP), including the variations on the structure of ADP schemes, the development of ADP algorithms and applications of ADP schemes. For ADP algorithms, the point of focus is that iterative algorithms of ADP can be sorted into two classes: one class is the iterative algorithm with initial stable policy; the other is the one without the requirement of initial stable policy. It is generally believed that the latter one has less computation at the cost of missing the guarantee of system stability during iteration process. In addition, many recent papers have provided convergence analysis associated with the algorithms developed. Furthermore, we point out some topics for future studies.
Taking a lead from the multi-faceted definitions and roles of the term "meme" in memetics, a plethora of potentially rich memetic computing methodologies, frameworks and operational memeinspired algorithms have been developed with considerable success in several realworld domains in the last two decades. This article showcase several successful deployments of memetic computing methodologies for solving complex problems, from science, engineering to digital arts.
Time series prediction techniques have been used in many real-world applications such as financial market prediction, electric utility load forecasting , weather and environmental state prediction, and reliability forecasting. The underlying system models and time series data generating processes are generally complex for these applications and the models for these systems are usually not known a priori. Accurate and unbiased estimation of the time series data produced by these systems cannot always be achieved using well known linear techniques, and thus the estimation process requires more advanced time series prediction algorithms. This paper provides a survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM). The underlying motivation for using SVMs is the ability of this methodology to accurately forecast time series data when the underlying system processes are typically nonlinear, non-stationary and not defined a-priori. SVMs have also been proven to outperform other non-linear techniques including neural-network based non-linear prediction techniques such as multi-layer perceptrons.The ultimate goal is to provide the reader with insight into the applications using SVM for time series prediction, to give a brief tutorial on SVMs for time series prediction, to outline some of the advantages and challenges in using SVMs for time series prediction, and to provide a source for the reader to locate books, technical journals, and other online SVM research resources.
"Big Data" as a term has been among the biggest trends of the last three years, leading to an upsurge of research, as well as industry and government applications. Data is deemed a powerful raw material that can impact multidisciplinary research endeavors as well as government and business performance. The goal of this discussion paper is to share the data analytics opinions and perspectives of the authors relating to the new opportunities and challenges brought forth by the big data movement. The authors bring together diverse perspectives, coming from different geographical locations with different core research expertise and different affiliations and work experiences. The aim of this paper is to evoke discussion rather than to provide a comprehensive survey of big data research.
The world continues to generate quintillion bytes of data daily, leading to the pressing needs for new efforts in dealing with the grand challenges brought by Big Data. Today, there is a growing consensus among the computational intelligence communities that data volume presents an immediate challenge pertaining to the scalability issue. However, when addressing volume in Big Data analytics, researchers in the data analytics community have largely taken a one-sided study of volume, which is the "Big Instance Size" factor of the data. The flip side of volume which is the dimensionality factor of Big Data, on the other hand, has received much lesser attention. This article thus represents an attempt to fill in this gap and places special focus on this relatively under-explored topic of "Big Dimensionality", wherein the explosion of features (variables) brings about new challenges to computational intelligence. We begin with an analysis on the origins of Big Dimensionality. The evolution of feature dimensionality in the last two decades is then studied using popular data repositories considered in the data analytics and computational intelligence research communities. Subsequently, the state-of-the-art feature selection schemes reported in the field of computational intelligence are reviewed to reveal the inadequacies of existing approaches in keeping pace with the emerging phenomenon of Big Dimensionality. Last but not least, the "curse and blessing of Big Dimensionality" are delineated and deliberated.
The initial state of an Unmanned Aerial Vehicle (UAV) system and the relative state of the system, the continuous inputs of each flight unit are piecewise linear by a Control Parameterization and Time Discretization (CPTD) method. The approximation piecewise linearization control inputs are used to substitute for the continuous inputs. In this way, the multi-UAV formation reconfiguration problem can be formulated as an optimal control problem with dynamical and algebraic constraints. With strict constraints and mutual interference, the multi-UAV formation reconfiguration in 3-D space is a complicated problem. The recent boom of bio-inspired algorithms has attracted many researchers to the field of applying such intelligent approaches to complicated optimization problems in multi-UAVs. In this paper, a Hybrid Particle Swarm Optimization and Genetic Algorithm (HPSOGA) is proposed to solve the multi-UAV formation reconfiguration problem, which is modeled as a parameter optimization problem. This new approach combines the advantages of Particle Swarm Optimization (PSO) and Genetic Algorithm (GA), which can find the time-optimal solutions simultaneously. The proposed HPSOGA will also be compared with basic PSO algorithm and the series of experimental results will show that our HPSOGA outperforms PSO in solving multi-UAV formation reconfiguration problem under complicated environments.
The performance of brain-computer interfaces (BCIs) improves with the amount of available training data; the statistical distribution of this data, however, varies across subjects as well as across sessions within individual subjects, limiting the transferability of training data or trained models between them. In this article, we review current transfer learning techniques in BCIs that exploit shared structure between training data of multiple subjects and/or sessions to increase performance. We then present a framework for transfer learning in the context of BCIs that can be applied to any arbitrary feature space, as well as a novel regression estimation method that is specifically designed for the structure of a system based on the electroencephalogram (EEG). We demonstrate the utility of our framework and method on subject-to-subject transfer in a motor-imagery paradigm as well as on session-to-session transfer in one patient diagnosed with amyotrophic lateral sclerosis (ALS), showing that it is able to outperform other comparable methods on an identical dataset.
Emulating the human brain is one of the core challenges of computational intelligence, which entails many key problems of artificial intelligence, including understanding human language, reasoning, and emotions. In this work, computational intelligence techniques are combined with common-sense computing and linguistics to analyze sentiment data flows, i.e., to automatically decode how humans express emotions and opinions via natural language. The increasing availability of social data is extremely beneficial for tasks such as branding, product positioning, corporate reputation management, and social media marketing. The elicitation of useful information from this huge amount of unstructured data, however, remains an open challenge. Although such data are easily accessible to humans, they are not suitable for automatic processing: machines are still unable to effectively and dynamically interpret the meaning associated with natural language text in very large, heterogeneous, noisy, and ambiguous environments such as the Web. We present a novel methodology that goes beyond mere word-level analysis of text and enables a more efficient transformation of unstructured social data into structured information, readily interpretable by machines. In particular, we describe a novel paradigm for real-time concept-level sentiment analysis that blends computational intelligence, linguistics, and common-sense computing in order to improve the accuracy of computationally expensive tasks such as polarity detection from big social data. The main novelty of the paper consists in an algorithm that assigns contextual polarity to concepts in text and flows this polarity through the dependency arcs in order to assign a final polarity label to each sentence. Analyzing how sentiment flows from concept to concept through dependency relations allows for a better understanding of the contextual role of each concept in text, to achieve a dynamic polarity inference that outperforms state-of-the-art statistical methods in terms of both accuracy and training time.
This paper provides an introduction to and an overview of type-2 fuzzy sets (T2 FS) and systems. It does this by answering the following questions: What is a T2 FS and how is it different from a T1 FS? Is there new terminology for a T2 FS? Are there important representations of a T2 FS and, if so, why are they important? How and why are T2 FSs used in a rule-based system? What are the detailed computations for an interval T2 fuzzy logic system (IT2 FLS) and are they easy to understand? Is it possible to have an IT2 FLS without type reduction? How do we wrap this up and where can we go to learn more?
This article provides a general overview of the field now known as "evolutionary multi-objective optimization," which refers to the use of evolutionary algorithms to solve problems with two or more (often conflicting) objective functions. Using as a framework the history of this discipline, we discuss some of the most representative algorithms that have been developed so far, as well as some of their applications. Also, we discuss some of the methodological issues related to the use of multi-objective evolutionary algorithms, as well as some of the current and future research trends in the area.
The science of opinion analysis based on data from social networks and other forms of mass media has garnered the interest of the scientific community and the business world. Dealing with the increasing amount of information present on the Web is a critical task and requires efficient models developed by the emerging field of sentiment analysis. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how to exploit the most recent technological tools and advances in Statistical Learning Theory (SLT) in order to efficiently build an Extreme Learning Machine (ELM) and assess the resultant model's performance when applied to big social data analysis. ELM represents a powerful learning tool, developed to overcome some issues in back-propagation networks. The main problem with ELM is in training them to work in the event of a large number of available samples, where the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of the most recent advances in SLT in order to address the issue of selecting ELM hyperparameters that give the best generalization performance.
Sentilo is a model and a tool to detect holders and topics of opinion sentences. Sentilo implements an approach based on the neo-Davidsonian assumption that events and situations are the primary entities for contextualizing opinions, which makes it able to distinguish holders, main topics, and sub-topics of an opinion. It uses a heuristic graph mining approach that relies on FRED, a machine reader for the Semantic Web that leverages Natural Language Processing (NLP) and Knowledge Representation (KR) components jointly with cognitively-inspired frames. The evaluation results are excellent for holder detection (F1: 95%), very good for subtopic detection (F1: 78%), and good for topic detection (F1: 68%).