Premium Essay

Knowledge Discovery in Medical Databases Leveraging Data Mining

In: Computers and Technology

Submitted By hajas
Words 35271
Pages 142
Abstract
Abstract

The goal of this master’s thesis is to identify and evaluate data mining algorithms which are commonly implemented in modern Medical Decision Support Systems (MDSS). They are used in various healthcare units all over the world. These institutions store large amounts of medical data. This data may contain relevant medical information hidden in various patterns buried among the records.

Within the research several popular MDSS’s are analysed in order to determine the most common data mining algorithms utilized by them. Three algorithms have been identified:
Naïve Bayes, Multilayer Perceptron and C4.5. Prior to the very analyses the algorithms are calibrated. Several testing configurations are tested in order to determine the best setting for the algorithms. Afterwards, an ultimate comparison of the algorithms orders them with respect to their performance. The evaluation is based on a set of performance metrics. The analyses are conducted in WEKA on five UCI medical datasets: breast cancer, hepatitis, heart disease, dermatology disease, diabetes.

The analyses have shown that it is very difficult to name a single data mining algorithm to be the most suitable for the medical data. The results gained for the algorithms were very similar. However, the final evaluation of the outcomes allowed singling out the Naïve Bayes to be the best classifier for the given domain. It was followed by the Multilayer Perceptron and the C4.5.

Keywords: Naïve Bayes, Multilayer Perceptron, C4.5, medical data mining, medical decision support Chapter 1: Introduction to the Study
Introduction
Thesis Structure
Study Overview
Background of the research
Focus Area & Motivation
Aims and Objectives
Research Problems
Motivation and Challenges
Thesis Outline
Intellectual Challenge
Justification for the Research
Methodology
Conclusion Chapter 1:…...

Similar Documents

Premium Essay

Data Mining

...Data Mining Jenna Walker Dr. Emmanuel Nyeanchi Information Systems Decision Making May 30, 2012 Abstract Businesses are utilizing techniques such as data mining to create a competitive advantage customer loyalty. Data mining allows business to analyze customer information, such as demographics and purchase history for a better understanding of what the customers need and what they will respond to. Data mining currently takes place in several industries, and will only become even more widespread as the benefits are endless. The purpose of this paper is to gain research and examine data mining, its benefits to businesses, and issues or concerns it will need to overcome. Real world case studies of how data mining is used will also be presented for a deeper understanding. This study will show that despite its disadvantages, data mining is an important step for a business to better understand its customers, and is the future of business marking and operational planning. Tools and Benefits of data mining Before examining the benefits of data mining, it is important to understand what data mining is exactly. Data mining is defined as “a process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases, including data warehouses” (Turban & Volonino, 2011). The information identified using data mining includes patterns indicating......

Words: 1900 - Pages: 8

Premium Essay

Data Mining

...Data Mining 0. Abstract With the development of different fields, artificial intelligence, machine learning, statistic, database, pattern recognition and neurocomputing they merge to a newly technology, the data mining. The ultimate goal of data mining is to obtain knowledge from the large database. It helps to discover previously unknown patterns, most of the time it is followed by deeper manual evaluation to explain and correlate the results to establish a new knowledge. It is often practically used by government, bank, insurance company and medical researcher. A general basic idea of data mining would be introduced. In this article, they are divided into four types, predictive modeling, database segmentation, link analysis and deviation detection. A brief introduction will explain the variation among them. For the next part, current privacy, ethical as well as technical issue regarding data mining will be discussed. Besides, the future development trends, especially concept of the developing sport data mining is written. Last but not the least different views on data mining including the good side, the drawback and our views are integrated into the paragraph. 1. Introduction This century, is the age of digital world. We are no longer able to live without the computing technology. Due to information explosion, we are having difficulty to obtain knowledge from large amount of unorganized data. One of the solutions, Knowledge Discovery in Database (KDD) is......

Words: 1700 - Pages: 7

Premium Essay

Data Mining

...A data warehouse is a subject-oriented, integrated, time-variant, nonupdateable collection of data used to support management decision-making and business intelligence (Hoffer, 2011). Business Intelligence (BI) is a term that describes a comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store and analyze data with the purpose of generating and presenting information to support decision making (Coronel, Morris, & Rob, 2013). Data Warehouse A data warehouse enables an organization to obtain the information about future trends and track customer demands. The key terms that define data warehouse are subject-oriented, integrated, time-variant, and nonupdateable. Each one has its meaning and importance in data warehousing. Subject-oriented – A data warehouse is organized around the key subjects that may include but not limited to customers, patients, students, products, and time. Integrated – The data in the data warehouse are defined using consistent naming conventions, formats, structures, and related characteristics. This means data warehouse holds one version of “the truth”. Time-variant – Data in the data warehouse contain a time dimension so they could be used to study trends and changes. Nonupdateable – Once the data gets loaded into the data warehouse, it could not be updated by the end users. Data warehousing is a process where organizations create and maintain data warehouses and extract......

Words: 1390 - Pages: 6

Premium Essay

Data Mining

...Data Mining Objectives: Highlight the characteristics of Data mining Operations, Techniques and Tools. A Brief Overview Online Analytical Processing (OLAP): OLAP is the dynamic synthesis, analysis, and consolidation of large volumns of multi-dimensional data. Multi-dimensional OLAP support common analyst operations, such as: ▪ Considation – aggregate of data, e.g. roll-ups from branches to regions. ▪ Drill-down – showing details, just the reverse of considation. ▪ Slicing and dicing – pivoting. Looking at the data from different viewpoints. E.g. X, Y, Z axis as salesman, Nth quarter and products, or region, Nth quarter and products. A Brief Overview Data Mining: Construct an advanced architecture for storing information in a multi-dimension data warehouse is just the first step to evolve from traditional DBMS. To realize the value of a data warehouse, it is necessary to extract the knowledge hidden within the warehouse. Unlike OLAP, which reveal patterns that are known in advance, Data Mining uses the machine learning techniques to find hidden relationships within data. So Data Mining is to ▪ Analyse data, ▪ Use software techniques ▪ Finding hidden and unexpected patterns and relationships in sets of data. Examples of Data Mining Applications: ▪ Identifying potential credit card customer groups ▪ Identifying buying patterns of customers. ▪ Predicting trends of......

Words: 1258 - Pages: 6

Premium Essay

Data Mining

...Data Mining Sherri White Dr. Edwin Otto CIS 500 Information System Decision Making September 2, 2012 Determine the benefits of data mining to the businesses when employing: 1. Predictive analytics to understand the behavior of customers Predictive analytics is business intelligence technology that produces a predictive score for each customer or other organizational element. Assigning these predictive scores is the job of a predictive model which has, in turn, been trained over your data, learning from the experience of your organization. Predictive analytics optimizes marketing campaigns and website behavior to increase customer responses, conversions and clicks, and to decrease churn. Each customer's predictive score informs actions to be taken with that customer. 1. Associations discovery in products sold to customers The way in which companies interact with their customers has changed dramatically over the past few years. A customer's continuing business is no longer guaranteed. As a result, companies have found that they need to understand their customers better, and to quickly respond to their wants and needs. In addition, the time frame in which these responses need to be made has been shrinking. It is no longer possible to wait until the signs of customer dissatisfaction are obvious before action must be taken. To succeed, companies must be proactive and anticipate what a customer desires. For an example in the old days, the store keepers would......

Words: 1909 - Pages: 8

Premium Essay

Data Warehousing and Data Mining

...Introduction 2 Assumptions 3 Data Availability 3 Overnight processing window 3 Business sponsor 4 Source system knowledge 4 Significance 5 Data warehouse 6 ETL: (Extract, Transform, Load) 6 Data Mining 6 Data Mining Techniques 7 Data Warehousing 8 Data Mining 8 Technology in Health Care 9 Diseases Analysis 9 Treatment strategies 9 Healthcare Resource Management 10 Customer Relationship Management 10 Recommended Solution 11 Corporate Solution 11 Technological Solution 11 Justification and Conclusion 12 References 14 Health Authority Data (Appendix A) 16 Data Warehousing Implementation (Appendix B) 19 Data Mining Implementation (Appendix B) 22 Technological Scenarios in Health Authorities (Appendix C) 26 Technology Tools 27 Data Management Technology Introduction The amount of information offered to us is literally astonishing, and the worthiness of data as an organizational asset is widely acknowledged. Nonetheless the failure to manage this enormous amount of data, and to swiftly acquire the information that is relevant to any particular question, as the volume of information rises, demonstrates to be a distraction and a liability, rather than an asset. This paradox energies the need for increasingly powerful and flexible data management systems. To achieve efficiency and a great level of productivity out of large and complex datasets, operators need have tools that streamline the tasks of managing the data and......

Words: 8284 - Pages: 34

Premium Essay

Data Mining

...Running Head: DATA MINING Assignment 4: Data Mining Submitted by: Submitted to: Course: Introduction Data Mining is also called as Knowledge Discovery in Databases (KDD). It is a powerful technology which has great potential in helping companies to focus on the most important information they have in their data base. Due to the increased use of technologies, interest in data mining has increased speedily. Data mining can be used to predict future behavior rather than focus on past events. This is done by focusing on existing information that may be stored in their data warehouse or information warehouse. Companies are now utilizing data mining techniques to assess their database for trends, relationships, and outcomes to improve their overall operations and discover new ways that may permit them to improve their customer services. Data mining provides multiple benefits to government, businesses, society as well as individual persons (Data Mining, 2011). Benefits of data mining to the businesses when employing Advantages of data mining from business point of view is that large sizes of apparently pointless information have been filtered into important and valuable business information to the company, which could be stored in data warehouses. While in the past, the responsibility was on marketing utilities and services, products, the center of attention is now on customers- their choices, preferences, dislikes and likes, and possibly data mining is one of the most important......

Words: 1302 - Pages: 6

Free Essay

Data Mining

...A Statistical Perspective on Data Mining Ranjan Maitra∗ Abstract Technological advances have led to new and automated data collection methods. Datasets once at a premium are often plentiful nowadays and sometimes indeed massive. A new breed of challenges are thus presented – primary among them is the need for methodology to analyze such masses of data with a view to understanding complex phenomena and relationships. Such capability is provided by data mining which combines core statistical techniques with those from machine intelligence. This article reviews the current state of the discipline from a statistician’s perspective, illustrates issues with real-life examples, discusses the connections with statistics, the differences, the failings and the challenges ahead. 1 Introduction The information age has been matched by an explosion of data. This surfeit has been a result of modern, improved and, in many cases, automated methods for both data collection and storage. For instance, many stores tag their items with a product-specific bar code, which is scanned in when the corresponding item is bought. This automatically creates a gigantic repository of information on products and product combinations sold. Similar databases are also created by automated book-keeping, digital communication tools or by remote sensing satellites, and aided by the availability of affordable and effective storage mechanisms – magnetic tapes, data warehouses and so on. This has created a......

Words: 22784 - Pages: 92

Premium Essay

Data Mining

...Data Mining Professor Clifton Howell CIS500-Information Systems Decision Making March 7, 2014 Benefits of data mining to the businesses One of the benefits to data mining is the ability to utilize information that you have stored to predict the possibilities of consumer’s actions and needs to make better business decisions. We implement a business intelligence that will produce a predictive score for those consumers to determine these possibilities. Predictive analytics is the business intelligence technology that produces a predictive score for each customer or other organizational element. Assigning these predictive scores is the job of a predictive model which has, in turn, been trained over your data, learning from the experience of your organization. (Impact, 2014) The usefulness of predictive scoring is obvious. However, with no predictive model and no means to score your consumer, the possibility of gaining a competitive edge and revenue is also predictable. To discover consumer buying patterns from a transaction database, mining association rules are used to make better business decisions. However because users may only be interested in certain information from this database and do not want to invest a lot of time in searching for what they need, association discovery will assist in limiting the data to which only the end user needs. Association discovery will utilize algorithms to lessen the quantity of groupings of item sets or sequences in each......

Words: 1318 - Pages: 6

Premium Essay

Data Mining in Hospitals

...Original Contributions Data Mining Applications in Healthcare Hian Chye Koh and Gerald Tan A B S T R A C T Data mining has been used intensively and extensively by many organizations. In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Data mining applications can greatly benefit all parties involved in the healthcare industry. For example, data mining can help healthcare insurers detect fraud and abuse, healthcare organizations make customer relationship management decisions, physicians identify effective treatments and best practices, and patients receive better and more affordable healthcare services. The huge amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional methods. Data mining provides the methodology and technology to transform these mounds of data into useful information for decision making. This article explores data mining applications in healthcare. In particular, it discusses data mining and its applications within healthcare in major areas such as the evaluation of treatment effectiveness, management of healthcare, customer relationship management, and the detection of fraud and abuse. It also gives an illustrative example of a healthcare data mining application involving the identification of risk factors associated with the onset of diabetes. Finally, the article highlights the limitations of data mining and discusses some future directions....

Words: 5507 - Pages: 23

Premium Essay

Report on Data Mining

...Report – Webcast 8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary......

Words: 818 - Pages: 4

Premium Essay

Report on Data Mining

...Report – Webcast 8/13/14 on Data Mining SAS (Statistical Analysis System) was originally developed as a project to analyze agriculture from 1966-1976 at North Carolina State University. As demand for such software grew, SAS Institute was founded in 1976. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and they provide more advanced options through the SAS programming language. On August 13 2014, SAS sponsored a web seminar titled “Analytically Speaking” the topic of the webcast was data mining techniques. Michael Berry and Gordon Linoff were the featured speakers, they have written a leading introductory book (on data mining) titled “Data Mining Techniques”. They discussed a lot of the current data mining landscape, including new methods, new types of data and the importance of using the right analysis for your problem (as good analysis is wasted doing the wrong thing). They also briefly discussed using ‘found data’ – text data, social data and device data. Michael Berry is the Business Intelligence Director at TripAdvisor and co-founder of Data Miners Inc. Gordon Linoff is co-founder of Data Miners Inc. and a consultant to financial, media and pharmaceutical companies. Data mining is the analysis step of the “KDD” (Knowledge Discovery in Databases). Data mining is an interdisciplinary......

Words: 818 - Pages: 4

Premium Essay

Intro to Data Mining

...Data Mining: Concepts and Techniques (3rd ed.) Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved. Adapted for CSE 347-447, Lecture 1b, Spring 2015 1 1 Introduction n  n  n  n  n  n  n  n  n  n  Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technologies Are Used? What Kind of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary 2 Why Data Mining? n  The Explosive Growth of Data: from terabytes to petabytes n  Data collection and data availability n  Automated data collection tools, database systems, Web, computerized society n  Major sources of abundant data n  n  n  Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube n  n  We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3 Evolution of Sciences: New Data Science Era n  n  Before 1600: Empirical science 1600-1950s: Theoretical science n  Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our......

Words: 3169 - Pages: 13

Premium Essay

Data Mining

...Data Mining Introduction to Management Information System 04-73-213 Section 5 Professor Mao March 22, 2011 Group 5: Carol DeBruyn, Jason Rekker, Matt Smith, Mike St. Denis Odette School of Business – The University of Windsor Table of Contents Table of Contents ……………………………………………………………...…….………….. ii Introduction ……………………………………………………………………………………… 1 Data Mining ……………………………………………………………………...……………… 1 Text Mining ……………………………………………………………………...……………… 4 Conclusion ………………………...…………………………………………………………….. 7 References ………………………………………………..……………………………………… 9 Introduction Everyday millions of transactions occur at thousands of businesses. Each transaction provides valuable data to these businesses. This valuable data is then stored in data warehouses and data marts for later reference. This stored data represents a large asset that until the advent of data mining had been largely unexploited. As companies attempt to gain a competitive advantage over each other, new data mining techniques have been developed. The most recent revolution in data mining has resulted in text mining. Prior to text mining, companies could only focus on leveraging their numerical data. Now companies are beginning to benefit from the textual data stored in data warehouses as well. Data Mining Data mining, which is also known as data discovery or knowledge discovery is the procedure that gathers, analyzes and places into perspective useful information. This facilitates the analysis of data......

Words: 2331 - Pages: 10

Premium Essay

Data Mining

...Data Mining Teresa M. Tidwell Dr. Sergey Samoilenko Information Systems for Decision Making September 2, 2012 Data Mining The use of data mining by companies assists them with identifying information and knowledge from databases and data warehouses that would be beneficial for the company. The information is often buried in databases, records, and files. With the use of tools such as queries and algorithms, companies can access data, analyze it, and use it to increase their profit. The benefits of using data mining, its reliability, and privacy concerns will be discussed. Benefits of Data Mining 1. Predictive Analytics: This type of analysis uses the customer’s data to make a specific model for the business. Existing information is used such as a customer’s recent purchases and their income, to create a prediction of future purchases and how much or what type of item would be purchased. The more variables used the more likely that the prediction will be correct. Such variables include the customer ranking, based on the number of and most recent purchases and the average profit made per customer purchase. Without data made available through web access and purchases by the customer, predictive analysis would be difficult to perform. The company, therefore, would not be able to plan nor predict how well they are performing. 2. Associations Discovery: This part of data mining helps the company to discover the “relationships hidden in larger data sets”......

Words: 1443 - Pages: 6