Lecture No. 1
Dated: 04-11-2024
What is Statistics
It is the science of drawing conclusions from real life data and is a tool for data based research.
It is also called quatitative analysis
.
It has applications in various disciplines such as:
- Agriculture
- Anthropology
- Astronomy
- Biology
- Economic
- Engineering
- Geology
- Genetics
- Medicine
- Physics
- Psychology
- Zoology
Nature of This Discipline
graph LR
A["Descriptive Statistics"] --> Probability
Probability --> C["Inferential Statistics"]
Meaning of Statistics
It comes from the latin
word status
which means political state.
It originally meant information useful to the state.
- It refers to "numerical facts systematically arranged". In this case, we always use the plural form of
statistics
. This is what the the wordstatistics
normal is referred to but the worddata
is used instead. - It is a discipline that involves procedures and techniques to collect, process and analyze numerical data to make inferences and research decisions against uncertainty which is incompleteness and instability of available data. In this case, the word is used as singular.
- It can also refer to numerical quantities calculated from sample observations. A single quantity that is collected, is called a
statistic
.
Characteristics of the Science of Statistics
It deals with
- Behavior of aggregates or large bodies of data of same kind, not isolated figures, ignoring what is happening to the individual or object of that aggregate.
- Variability which obscures the underlying patterns.
- Uncertainties as every process of gathering information involves deficiencies and variations.
- Characteristics of things which can be expressed numerically either by count or measurements.
- Aggregates which are subject to a number of random causes. e.g. the height of a person is subject to his race, age, diet, habits, climate etc.
- Statistical laws are valid for the average or in long run. Therefore,
statistical inference
is therefore made againstuncertainty
. - If the data collection and processing is not handled properly, the results could be misleading.
They way in Which Statistics Works
It assists in
- Summarizing the larger set of data in a form that is easily understandable.
- Efficient design of laboratories, field experiments, surveys.
- Effective planning in any field of inquiry.
- Making general conclusions that how much of a certain thing will happen in certain conditions.
Importance of Statistics in Various Fields
- A modern administrator uses statistical data to provide a factual basis for decision.
- A politician uses
statistics
to support his arguments. - A businessman, industrial or research worker all use
statistics
in their work.
Quote
“a social scientist without an adequate understanding of statistics, is often like the blind man groping in a dark room for a black cat that is not there”
The Meaning of Data
It comes from the latin
word datum
(singular) and data
means "those which are given".
Data
may therefore be thought as the results of observations
.
Observations
An observation
means any sort of numerical recording of information whether it is physical such as measuring height
, width
, a classification
such as heads
or tails
, or an answer to a question such as yes
or no
.
Variables
A characteristic which varies with an object or an individual such as age
.
The set of all possible values from which a variable
can take, is called domain
.
If the domain
consists of only one value then the variable
is called a constant
.
Quantitative and Qualitative Variables
Quantitative
The variables
which are numerical such as age
, height
etc are called quantitative variables
.
Qualitative
The variables
which are not numerical such as eye color
, poverty
, education
etc are called qualitative variables
.
Discrete and Continuous Variables
Discrete
These variables
take a discrete set of integers or whole numbers, as jumps or breaks.
Examples
- Number of rooms in a house
- Number of family members
Continuous
These variables
can take any fractional or integer values given within an interval
without gaps.
Examples
- Age
- Height
- Temperature
Measurement Scales
Measurement
Measurement
means assigning a number to observations or objects.
Scaling
It is the processing of measuring
.
Following are the scales
discussed.
Nominal Scale
Classification of observations into mutually exclusive qualitative categories.
Examples
- Students in a class are males or females, hence we can assign
1
and2
respectively. - Rainfall can be heavy, moderate or light, hence we can assign
1
,2
and3
respectively.
Ordinal or Ranking Scale
Similar to nominal scale but with addition of the property of ordering or ranking.
Examples
good
,fair
,poor
can be ranked1
,2
and3
.
Interval Scale
A measurement scale possessing a constant interval size but not a true zero point, is called interval scale
.
Examples
- \(20^\circ C\) (\(68^\circ F\)) and \(30^\circ C\) (\(86^\circ F\))
- \(5^\circ C\) (\(41^\circ F\)) and \(15^\circ C\) (\(59^\circ F\))
Both of the scales have equal intervals.
10 units for celsius
scale and 18 units for farenheit
.
Ratio Scale
Same as interval scale but it has a true meaningful zero point
.
Errors of Measurement
The true value can never be measured because of certain habits, practices, methods and instruments used for measurements.
This departure of recorded value from the true value is called error of measurement
.
If the observed value is \(x\) but the true value is \(x + \epsilon\) then the absolute error
is \((x + \epsilon) - \epsilon\)
The ratio of absolute error
to true value
\(\frac{(x + \epsilon) - \epsilon}{x + \epsilon}\) is called the relative error
.
Multiplying relative error
by 100 gives us percentage error
.
Example
If a student's weight is measured as \(60KG\) then we know for certain that his true weight is within \(60.5KG\) to \(59.5KG\) range.
Biased and Random Errors
Biased
An error
is called biased if the observed value is constantly or consistently higher or lower than the true value.
This happens due to to limitations of measuring instrument and is cumulative.
Being cumulative means that the more measurements we do, the error
adds up and its magnitude in overall result increases.
They are also called cumulative
or systematic errors
and are not revealed by repeating measurements.
Random or Unbiased
An error
is said to be unbiased or random when the deviations from the true value, either excesses or defects tend to occur equally often.
When measurements
happen often, they tend to cancel out in the long run.
Therefore, these errors
are compensating and are also called accidental errors
.