Read the article “What is Data Mining?” Then using Microsoft Word, answer the following questions. Be sure to include the question as part of your answer document.a. The article discussed Customer Insights and Disney. How could Customer Insights be used on a cruise ship?b. Data veracity is crucial in data mining. Imagine a team of workers survey shoppers at large stores and malls about their shopping habits. How could a lack of data veracity result?c. K-Nearest Neighbor is a great data mining technique. How do you think Amazon uses K-Nearest Neighbor?2.An overview of the book “Ghost Map” was assigned as reading material. The document concluded with a copy of the original map John Snow drew for his 1854 analysis of the Broad Street pump in London. The red circle indicates the location of the pump. Study the map and notice the black squares and black rectangles. Write a one paragraph analysis of Snow’s map. What did he demonstrate with his primitive data mining?
Unformatted Attachment Preview
Downloaded from: https://en.wikipedia.org/wiki/The_Ghost_Map
The Ghost Map: The Story of London’s Most Terrifying Epidemic – and How it Changed
Science, Cities and the Modern World is a book by Steven Johnson in which he describes the
most intense outbreak of cholera in Victorian London The book incorporated the idea of
community, dealing with the effects of an epidemic in a city of common values, language, and
traditions. The two central protagonists are Dr. John Snow, who created a map of the cholera
cases, and the Reverend Henry Whitehead, whose extensive knowledge of the local community
helped determine the initial cause of the outbreak. Dr. John Snow was a revered anesthetist who
carried out epidemiological work in Soho, London. Around the mid-1850s Snow figured out the
source of cholera contamination to be the drinking water from the Broad Street pump. The book
was released on 19 October 2006. The cholera outbreak from 1848-49 killed approximately
54,000-62,000 in London, and the outbreak from 1853-54 killed an estimated 31,000 in London.
Chapter 1: The Night Soil Men
The novel starts on speaking about individuals in Victorian London who are scavengers and
recycling resources. Even though at the time during the 19th century London was the richest
country in the world at the time. The author says the toilets at the time were patented and
therefore they were using more water than they needed to. Because of the excess use of water,
sewages were overflowing. The rest of the chapter speaks on how the city is getting more
populated and more polluted.
Chapter 2: Eyes Sunk, Lips Dark Blue
We see the first case of Cholera, but it is not explicitly said that it is cholera. Henry Whitehead is
introduced in this chapter and is known as a man who likes to converse about politics and
science. He is a man of the people because he is also a priest that works in St. Luke’s church.
Johnson mentions that there is a ton of pollution in that area and there is a bunch of horse manure
there. He then mentions the water pump at Broad Street and how it was the most popular water
pump because it was colder than all of the other ones. Mr. G, a community tailor, is said to be
sick of his stomach and at first it is thought to have food poisoning. Mr. G started to get a lot of
the symptoms that cholera causes. Within a few hours, Mr. G and a dozen other Soho residents
died. A few days go by and hundreds of Soho residents have died, and many are sick. Medical
officer John Rodgers goes from house to house speaking to the ill and he knew that they were in
the midst of a Cholera outbreak.
Chapter 3: The Investigator
This was the deadliest outbreak of cholera there has ever been. The chapter starts off with the
introduction of John Snow, a successful doctor that figured out one of the earlier anesthetics. He
found ether dosages and then later found out that chloroform was a better anesthesia than ether.
The chapter speaks about the many theories behind how people got cholera. The most popular
according to doctors is the miasma theory. William Farr, London’s sanitation commissioner and
chief demographer, believes this theory as well. By 1849, Snow proposed that cholera was
contracted through either contact with waste or ingesting contaminated water. Still doctors
believed that we haven’t figured out the cause of cholera. John Snow was going through William
Farr’s mortality numbers of 1845 and the chapter ends with both Snow and Whitehead being
served water from Broad Street and Whitehead drinks from it.
Chapter 4: That is to say, Jo has not yet died
Soho is a ghost town and the Eley Brothers Factory is almost abandoned. On the other hand, the
Lion Brewery (which is not too far away from the factory) had no cases of cholera. The famous
patriarch of Waterstone is dying from cholera and people are speculating that the outbreak had to
do with the new sewage system being in contact with human corpses. Snow and Farr are
collaborating in researching cholera. They sample water from all over London. During this
investigation, another outbreak of cholera happens, and Snow sees this as an opportunity to
improve his research.
Chapter 5: All Smell is Disease
Soho begins to improve, but some were still dying. The ones that survived attributed it to the
Broad Street pump water that they have been drinking. Medical officers from the Board of
Health visited Soho and poured bleach and chloride all throughout the city. The president of the
board is introduced (Benjamin Hall) and the predecessor is also mentioned (Edwin Chadwick).
Chadwick caused many deaths because of his belief that disease was passed on by bad smell.
Chadwick innovated a new sewage pipes that only made the problem worse. The author then
talks about the miasma theory and why it was so popular and how bad smells related to disease.
John Snow hypothesizes the Broad Street water pump was responsible for the outbreak.
Chapter 6: Building the Case
John Snow tries to perform his experiment by interviewing Mr. G who lives a little farther from
Broad Street (A place called Cross Street) and using the water pump would be a little
inconvenient. He finds out that the disease has taken his life and he can no longer interview him.
Snow then decides to interview people that lived in Broad Street but did not get sick due to not
drinking from the pump. Snow realized that workers from the Lion Brewery did not get sick
because they were paid partly with beer and had their own personal fountain. Whitehead is going
around talking to people and he went to mass and saw that St. Luke’s Scripture reader, James
Richardson, was not there. When Whitehead went to go check up on him and he saw that he was
sick with cholera. James Richardson mentions that he had drank from the Broad Street pump.
Whitehead thinks there is a correlation between the water from the pump and the outbreak but
then later thinks he is silly. He continues to drink a glass of water from the pump. John Snow
gathers more research and more proof that the outbreak had to do with the Broad Street water
source. He comes across the problem of what the plan of action should be.
Chapter 7: The Pump Handle
The board of governors of St. James parish work to figure out how the community should deal
with this outbreak. John Snow provided his input and made very good claims to close down the
well. Ultimately, the Board votes for the well to be closed. Henry Whitehead didn’t like that they
closed the pump down and requests that they open it up again, but the Board declines. Whitehead
starts to do research on the elderly that survived the disease to disprove that the water pump was
what caused the cholera outbreak. Whitehead challenges John Snow’s theory when he learns that
some people argue that drinking the water helped cure the disease. He later sees John Snow’s
point and agrees with him. However, he Committee declines Snow’s proposal that the outbreak
was due to water contamination.
Chapter 8: Conclusion
John Snow makes a map of the cholera outbreak. Years went by and the miasma theory was
dying down. Unfortunately, John Snow had a stroke and wasn’t able to see the new sewage
system and the slow change from the miasma theory. During the 1880s a scientist by the name of
Robert Koch discovered the bacteria that cause cholera. The author speaks on how John Snow
and Henry Whitehead were pioneers who helped improve the understanding of disease.
Death is a relevant theme because the cholera epidemic was a dangerous time and many people
died in Victorian London. The book shows detailed description of the amount of dead bodies
there were just lying on the floor. A tragic way to look at your people, just like dead animals.
However, death is needed in order to take a plan of action and really find the problem and a
solution as fast as possible so that less people are affected. In this story, we see how death can be
used to find common ways of getting sick and eventually with the help from John Snow and
others, they were able to find the root of the problem and slow the epidemic down.
The author says that he has an interest in the scientific field, but he has no background of
knowledge. We see a lot of science in this novel and John Snow is seen as brilliant for his work
in improving anesthetics. A lot of research is done to try and find the root of the problem.
Cholera seems to have killed many, but John Snow and Henry Whitehead made sure to interview
as many people to find something in common and then hopefully come up with a conclusion.
This is the general scientific process of doing an experiment. During the end of the book we see
how science has improved and it is mentioned that people boiled their water before drinking it so
they did not drink contaminated water.
Growth as a Community
In the middle to end of the book we really see the community coming together and trying to find
a solution to the cholera outbreak. We see Board meetings that take into account all arguments
and shutting down the Broad Street pump saved many people because there would have been
more cholera outbreaks and even more people would have died. After the death of John Snow,
we see that the people are planning on making a new sewage system and it all has to do with the
more research found. The community made sure to come together and improve in order to help
the future generation live a better life.
Steven Johnson’s motivation to writing this non-fiction was his interest in science and especially
the impact cholera had on a Victorian society. He liked how we know now that cholera is found
in the water and back then, it was thought to be in the air. The author wanted to give more credit
to the 19th century doctors that made it possible to come closer to a solution to cholera. He also
wanted to show that as a community, anything is possible and even the biggest problems can be
solved if we work together. He also wants us to listen to people that have done extensive
research and not ignore or push their ideas to the side.
Note to the reader by Dr. Smith: the map on the next page was produced by Dr. John Snow. The red
circle indicates the location of the Broad Street pump. The black squares and rectangles indicate where
someone died. The larger the area of the square or rectangle, the greater the number of deaths.
What is Data Mining?
Data mining is the exploration and analysis of large data to discover meaningful patterns and
rules. It’s considered a discipline under the data science field of study and differs from predictive
analytics because it describes historical data, while data mining aims to predict future outcomes.
Additionally, data mining techniques are used to build machine learning (ML) models that power
modern artificial intelligence (AI) applications such as search engine algorithms and
Applications of Data Mining
DATABASE MARKETING AND TARGETING
Retailers use data mining to better understand their customers. Data mining allows them to better
segment market groups and tailor promotions to effectively drill down and offer customized
promotions to different consumers.
How to do Data Mining
The accepted data mining process involves six steps:
1. Business understanding
The first step is establishing the goals of the project are and how data mining can help you reach
that goal. A plan should be developed at this stage to include timelines, actions, and role
2. Data understanding
Data is collected from all applicable data sources in this step. Data visualization tools are often
used in this stage to explore the properties of the data to ensure it will help achieve the business
3. Data preparation
Data is then cleansed, and missing data is included to ensure it is ready to be mined. Data
processing can take enormous amounts of time depending on the amount of data analyzed and
the number of data sources. Therefore, distributed systems are used in modern database
management systems (DBMS) to improve the speed of the data mining process rather than
burden a single system. They’re also more secure than having all an organization’s data in a
single data warehouse. It’s important to include failsafe measures in the data manipulation stage
so data is not permanently lost.
4. Data Modeling
Mathematical models are then used to find patterns in the data using sophisticated data tools.
The findings are evaluated and compared to business objectives to determine if they should be
deployed across the organization.
In the final stage, the data mining findings are shared across everyday business operations. An
enterprise business intelligence platform can be used to provide a single source of the truth for
self-service data discovery.
Benefits of Data Mining
Data Mining allows organizations to continually analyze data and automate both routine and
critical decisions without the delay of human judgment. Banks can instantly detect fraudulent
transactions, request verification, and even secure personal information to protect customers
against identity theft. Deployed within a firm’s operational algorithms, these models can collect,
analyze, and act on data independently to streamline decision making and enhance the daily
processes of an organization.
Accurate Prediction and Forecasting
Planning is a critical process within every organization. Data mining facilitates planning and
provides managers with reliable forecasts based on past trends and current
conditions. Macy’s implements demand forecasting models to predict the demand for each
clothing category at each store and route the appropriate inventory to efficiently meet the
Data mining allows for more efficient use and allocation of resources. Organizations can plan
and make automated decisions with accurate forecasts that will result in maximum cost
reduction. Delta imbedded RFID chips in passengers checked baggage and deployed data mining
models to identify holes in their process and reduce the number of bags mishandled. This process
improvement increases passenger satisfaction and decreases the cost of searching for and rerouting lost baggage.
Firms deploy data mining models from customer data to uncover key characteristics and
differences among their customers. Data mining can be used to create personas and personalize
each touchpoint to improve overall customer experience. In 2017, Disney invested over one
billion dollars to create and implement “Magic Bands.” These bands have a symbiotic
relationship with consumers, working to increase their overall experience at the resort while
simultaneously collecting data on their activities for Disney to analyze to further enhance their
Challenges of Data Mining
While a powerful process, data mining is hindered by the increasing quantity
and complexity of big data. Where exabytes of data are collected by firms
every day, decision-makers need ways to extract, analyze, and gain insight
from their abundant repository of data.
The challenges of big data are prolific and penetrate every field that collects, stores, and analyzes
data. Big data is characterized by four major challenges: volume, variety, veracity, and velocity.
The goal of data mining is to mediate these challenges and unlock the data’s value.
Volume describes the challenge of storing and processing the enormous quantity of data
collected by organizations. This enormous amount of data presents two major challenges: first, it
is more difficult to find the correct data, and second, it slows down the processing speed of data
Variety encompasses the many different types of data collected and stored. Data mining tools
must be equipped to simultaneously process a wide array of data formats. Failing to focus an
analysis on both structured and unstructured data inhibits the value added by data mining.
Velocity details the increasing speed at which new data is created, collected, and stored. While
volume refers to increasing storage requirement and variety refers to the increasing types of data,
velocity is the challenge associated with the rapidly increasing rate of data generation.
Finally, veracity acknowledges that not all data is equally accurate. Data can be messy,
incomplete, improperly collected, and even biased. With anything, the quicker data is collected,
the more errors will manifest within the data. The challenge of veracity is to balance the quantity
of data with its quality.
Over-fitting occurs when a model explains the natural errors within the sample instead of the
underlying trends of the population. Over-fitted models are often overly complex and utilize an
excess of independent variables to generate a prediction. Therefore, the risk of over-fitting is
heighted by the increase in volume and variety of data. Too few variables make the model
irrelevant, where as too many variables restrict the model to the known sample data. The
challenge is to moderate the number of variables used in data mining models and balance its
predictive power with accuracy.
Cost of Scale
As data velocity continues to increase data’s volume and variety, firms must scale these models
and apply them across the entire organization. Unlocking the full benefits of data mining with
these models requires significant investment in computing infrastructure and processing power.
To reach scale, organizations must purchase and maintain powerful computers, servers, and
software designed to handle the firm’s large quantity and variety of data.
Privacy and Security
The increased storage requirement of data has forced many firms to turn toward cloud computing
and storage. While the cloud has empowered many modern advances in data mining, the nature
of the service creates significant privacy and security threats. Organizations must protect their
data from malicious figures to maintain the trust of their partners and customers.
With data privacy comes the need for organizations to develop internal rules and constraints on
the use and implementation of a customer’s data. Data mining is a powerful tool that provides
businesses with compelling insights into their consumers. However, at what point do these
insights infringe on an individual’s privacy? Organizations must weigh this relationship with
their customers, develop policies to benefit consumers, and communicate these policies to the
consumers to maintain a trustworthy relationship.
Types of Data Mining
Data mining has two primary processes: supervised and unsupervised
The goal of supervised learning is prediction or classification. The easiest way to conceptualize
this process is to look for a single output variable. A process is considered supervised learning if
the goal of the model is to predict the value of an observation. One example is spam filters,
which use supervised learning to classify incoming emails as unwanted content and
automatically remove these messages from your inbox.
Common analytical models used in supervised data mining approaches are:
Linear regressions predict the value of a continuous variable using one or more independent
inputs. Realtors use linear regressions to predict the value of a house based on square footage,
bed-to-bath ratio, year built, and zip code.
Logistic regressions predict the probability of a categorical variable using one or more
independent inputs. Banks use logistic regressions to predict the probability that a loan applicant
will default based on credit score, household income, age, and other personal factors.
Purchase answer to see full