Companies increasingly adopt process-aware information systems (PAISs), which offer promising perspectives for more flexible enterprise computing. The emergence of different process support paradigms and the lack of methods for comparing existing approaches enabling PAIS changes have made the selection of adequate process management technology difficult. This paper suggests a set of 18 change patterns and seven change support features to foster the systematic comparison of existing process management technology in respect to process change support. While the proposed patterns are all based on empirical evidence from several large case studies, the suggested change support features constitute typical functionalities provided by flexible PAISs. Based on the proposed change patterns and features, we provide a detailed analysis and evaluation of selected approaches from both academia and industry. The presented work will not only facilitate the selection of technologies for realizing flexible PAISs, but can also be used as a reference for implementing flexible PAISs.
The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. The role of conceptual modeling for big data is then analyzed and suggestions made for effective conceptual modeling efforts with respect to big data.
Analysis of trajectory data is the key to a growing number of applications aiming at global understanding and management of complex phenomena that involve moving objects (e.g. worldwide courier distribution, city traffic management, bird migration monitoring). Current DBMS support for such data is limited to the ability to store and query raw movement (i.e. the spatio-temporal position of an object). This paper explores how conceptual modeling could provide applications with direct support of trajectories (i.e. movement data that is structured into countable semantic units) as a first class concept. A specific concern is to allow enriching trajectories with semantic annotations allowing users to attach semantic data to specific parts of the trajectory. Building on a preliminary requirement analysis and an application example, the paper proposes two modeling approaches, one based on a design pattern, the other based on dedicated data types, and illustrates their differences in terms of implementation in an extended-relational context.
Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semi-automatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.
This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, (ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial–temporal data warehouse system designed for storing and clustering a wide range of spatial–temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results.
The research described in this paper is focused on analyzing two playful domains of language: humor and irony, in order to identify key values components for their automatic processing. In particular, we are focused on describing a model for recognizing these phenomena in social media, such as “tweets". Our experiments are centered on five data sets retrieved from Twitter taking advantage of user-generated tags, such as “#humor" and “#irony". The model, which is based on textual features, is assessed on two dimensions: representativeness and relevance. The results, apart from providing some valuable insights into the creative and figurative usages of language, are positive regarding humor, and encouraging regarding irony.
Use of traditional -mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on -mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of -mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach.
Healthcare processes require the cooperation of different organizational units and medical disciplines. In such an environment optimal process support becomes crucial. Though healthcare processes frequently change, and therefore the separation of the flow logic from the application code seems to be promising, workflow technology has not yet been broadly used in healthcare environments. In this paper we elaborate both the potential and the essential limitations of IT support for healthcare processes. We identify different levels of process support in healthcare, and distinguish between organizational processes and the medical treatment process. To recognize the limitations of IT support we adopt a broad socio-technical perspective based on scientific literature and personal experience. Despite of the limitations we identified, undeniably, IT has a huge potential to improve healthcare quality which has not been explored by current IT solutions. In particular, we indicate how advanced process management technology can improve IT support for healthcare processes.
This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, (ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial-temporal data warehouse system designed for storing and clustering a wide range of spatial-temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results. (c) 2006 Elsevier B.V. All rights reserved.
The increasing availability of large-scale trajectory data provides us great opportunity to explore them for knowledge discovery in transportation systems using advanced data mining techniques. Nowadays, large number of taxicabs in major metropolitan cities are equipped with a GPS device. Since taxis are on the road nearly 24 h a day (with drivers changing shifts), they can now act as reliable sensors to monitor the behavior of traffic. In this article, we use GPS data from taxis to monitor the emergence of unexpected behavior in the Beijing metropolitan area, which has the potential to estimate and improve traffic conditions in advance. We adapt likelihood ratio test statistic (LRT) which have previously been mostly used in epidemiological studies to describe traffic patterns. To the best of our knowledge the use of LRT in traffic domain is not only novel but results in accurate and rapid detection of anomalous behavior.
There is a surge of community detection study on complex network analysis in recent years, since communities often play important roles in network systems. However, many real networks have more complex overlapping community structures. This paper proposes a novel algorithm to discover overlapping communities. Different from conventional algorithms based on node clustering, the proposed algorithm is based on link clustering. Since links usually represent unique relations among nodes, the link clustering will discover groups of links that have the same characteristics. Thus nodes naturally belong to multiple communities. The algorithm applies genetic operation to cluster on links. An effective encoding schema is designed and the number of communities can be automatically determined. Experiments on both artificial networks and real networks validate the effectiveness and efficiency of the proposed algorithm.
Case handling is a new paradigm for supporting flexible and knowledge intensive business processes. It is strongly based on data as the typical product of these processes. Unlike workflow management, which uses predefined process control structures to determine what should be done during a workflow process, case handling focuses on what be done to achieve a business goal. In case handling, the knowledge worker in charge of a particular case actively decides on how the goal of that case is reached, and the role of a case handling system is assisting rather than guiding her in doing so. In this paper, case handling is introduced as a new paradigm for supporting flexible business processes. It is motivated by comparing it to workflow management as the traditional way to support business processes. The main entities of case handling systems are identified and classified in a meta model. Finally, the basic functionality and usage of a case handling system is illustrated by an example.
In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper a new semantic similarity metric, that exploits some notions of the feature-based theory of similarity and translates it into the information theoretic domain, which leverages the notion of Information Content (IC), is presented. In particular, the proposed metric exploits the notion of IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, an on line experiment asking the community of researchers to rank a list of 65 word pairs has been conducted. The experiment’s web setup allowed to collect 101 similarity ratings and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that the proposed metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC-based metrics. In order to investigate the generality of both the intrinsic IC formulation and proposed similarity metric a further evaluation using the MeSH biomedical ontology has been performed. Even in this case significant results were obtained. The proposed metric and several others have been implemented in the Java WordNet Similarity Library.
We consider a workflow graph as a model for the control flow of a business process and study the problem of workflow graph parsing, i.e., finding the structure of a workflow graph. More precisely, we want to find a decomposition of a workflow graph into a hierarchy of sub-workflows that are subgraphs with a single entry and a single exit of control. Such a decomposition is the crucial step, for example, to translate a process modeled in a graph-based language such as BPMN into a process modeled in a block-based language such as BPEL. For this and other applications, it is desirable that the decomposition be unique, and as fine as possible, where means that a local change of the workflow graph can only cause a local change of the decomposition. In this paper, we provide a decomposition that is unique, modular and finer than in previous work. We call it the . It is based on and extends similar work for sequential programs by Tarjan and Valdes [ACM POPL ’80, 1980, pp. 95–105]. We give two independent characterizations of the refined process structure tree which we prove to be equivalent: (1) a simple descriptive characterization that justifies our particular choice of the decomposition and (2) a constructive characterization that allows us to compute the decomposition in linear time. The latter is based on the tree of triconnected components (elsewhere also known as the SPQR tree) of a biconnected graph.
Much research has been devoted over the years to investigating and advancing the techniques and tools used by analysts when they model. As opposed to what academics, software providers and their resellers promote as should be happening, the aim of this research was to determine whether practitioners still embraced conceptual modeling seriously. In addition, what are the most popular techniques and tools used for conceptual modeling? What are the major purposes for which conceptual modeling is used? The study found that the top six most frequently used modeling techniques and methods were ER diagramming, data flow diagramming, systems flowcharting, workflow modeling, UML, and structured charts. Modeling technique use was found to decrease significantly from smaller to medium-sized organizations, but then to increase significantly in larger organizations (proxying for large, complex projects). Technique use was also found to significantly follow an inverted U-shaped curve, contrary to some prior explanations. Additionally, an important contribution of this study was the identification of the factors that uniquely influence the decision of analysts to continue to use modeling, viz., communication (using diagrams) to/from stakeholders, internal knowledge (lack of) of techniques, user expectations management, understanding models’ integration into the business, and tool/software deficiencies. The highest ranked purposes for which modeling was undertaken were database design and management, business process documentation, business process improvement, and software development.
We report on a case study on control-flow analysis of business process models. We checked 735 industrial business process models from financial services, telecommunications, and other domains. We investigated these models for soundness (absence of deadlock and lack of synchronization) using three different approaches: the business process verification tool Woflan, the Petri net model checker LoLA, and a recently developed technique based on SESE decomposition. We evaluate the various techniques used by these approaches in terms of their ability of accelerating the check. Our results show that industrial business process models can be checked in a few milliseconds, which enables tight integration of modeling with control-flow analysis. We also briefly compare the diagnostic information delivered by the different approaches and report some first insights from industrial applications.
Current Web service technology is evolving towards a simpler approach to define Web service APIs that challenges the assumptions made by existing languages for Web service composition. RESTful Web services introduce a new kind of abstraction, the resource, which does not fit well with the message-oriented paradigm of the Web service description language (WSDL). RESTful Web services are thus hard to compose using the Business Process Execution Language (WS-BPEL), due to its tight coupling to WSDL. The goal of the BPEL for REST extensions presented in this paper is twofold. First, we aim to enable the composition of both RESTful Web services and traditional Web services from within the same process-oriented service composition language. Second, we show how to publish a BPEL process as a RESTful Web service, by exposing selected parts of its execution state using the REST interaction primitives. We include a detailed example on how BPEL for REST can be applied to orchestrate a RESTful e-Commerce scenario and discuss how the proposed extensions affect the architecture of a process execution engine.
An international standard has now been established for evaluating the quality of software products. However there is no equivalent standard for evaluating the quality of conceptual models. While a range of quality frameworks have been proposed in the literature, none of these have been widely accepted in practice and none has emerged as a potential standard. As a result, conceptual models continue to be evaluated in practice in an way, based on common sense, subjective opinions and experience. For conceptual modelling to progress from an “art” to an engineering discipline, quality standards need to be defined, agreed and applied in practice. This paper conducts a review of research in conceptual model quality and identifies the major theoretical and practical issues which need to be addressed. We consider how conceptual model quality frameworks can be structured, how they can be developed, how they can be empirically validated and how to achieve acceptance in practice. We argue that the current proliferation of quality frameworks is counterproductive to the progress of the field, and that researchers and practitioners should work together to establish a common standard (or standards) for conceptual model quality. Finally, we describe some initial efforts towards developing a common standard for data model quality, which may provide a model for future standardisation efforts.
Determining the number of clusters is an important part of cluster validity that has been widely studied in cluster analysis. Sum-of-squares based indices show promising properties in terms of determining the number of clusters. However, knee point detection is often required because most indices show monotonicity with increasing number of clusters. Therefore, indices with a clear minimum or maximum value are preferred. The aim of this paper is to revisit a sum-of-squares based index called the WB-index that has a minimum value as the determined number of clusters. We shed light on the relation between the WB-index and two popular indices which are the Calinski–Harabasz and the Xu-index. According to a theoretical comparison, the Calinski–Harabasz index is shown to be affected by the data size and level of data overlap. The Xu-index is close to the WB-index theoretically, however, it does not work well when the dimension of the data is greater than two. Here, we conduct a more thorough comparison of 12 internal indices and provide a summary of the experimental performance of different indices. Furthermore, we introduce the sum-of-squares based indices into automatic keyword categorization, where the indices are specially defined for determining the number of clusters.