# Logistic Regression

Figure 1: Example of a Logistic Curve. The values of y cannot be less than 0 or greater than 1.

Logistic Regression, also known as Logit Regression or Logit Model, is a mathematical model used in statistics to estimate (guess) the probability of an event occurring having been given some previous data. Logistic Regression works with binary data, where either the event happens (1) or the event does not happen (0). So given some feature x it tries to find out whether some event y happens or not. So y can either be 0 or 1. In the case where the event happens, y is given the value 1. If the event does not happen, then y is given the value of 0. For example, if y represents whether a sports team wins a match, then y will be 1 if they win the match or y will be 0 if they do not. This is known as Binomial Logistic Regression. There is also another form of Logistic Regression which uses multiple values for the variable y. This form of Logistic Regression is known as Multinomial Logistic Regression.

Logistic Regression uses the logistic function to find a model that fits with the data points. The function gives an 'S' shaped curve to model the data. The curve is restricted between 0 and 1, so it is easy to apply when y is binary. Logistic Regression can then model events better than linear regression, as it shows the probability for y being 1 for a given x value. Logistic Regression is used in statistics and machine learning to predict values of an input from previous test data.

## Basics

Logistic regression is an alternative method to use other than the simpler Linear Regression. Linear regression tries to predict the data by finding a linear – straight line – equation to model or predict future data points. Logistic regression does not look at the relationship between the two variables as a straight line. Instead, Logistic regression uses the natural logarithm function to find the relationship between the variables and uses test data to find the coefficients. The function can then predict the future results using these coefficients in the logistic equation.

Logistic regression uses the concept of odds ratios to calculate the probability. This is defined as the ratio of the odds of an event happening to its not happening. For example, the probability of a sports team to win a certain match might be 0.75. The probability for that team to lose would be 1 – 0.75 = 0.25. The odds for that team winning would be 0.75/0.25 = 3. This can be said as the odds of the team winning are 3 to 1.[1]

The odds can be defined as:

${\displaystyle Odds={P(y=1|x) \over 1-P(y=1|x)}}$

The natural logarithm of the odds ratio is then taken in order to create the logistic equation. The new equation is know as the logit:

${\displaystyle Logit(P(x))=\ln \left({P(y=1|x) \over (1-P(y=1|x)}\right)}$

In Logistic regression the Logit of the probability is said to be linear with respect to x, so the logit becomes:

${\displaystyle Logit(P(x))=a+bx}$

Using the two equations together then gives the following:

${\displaystyle {P(y=1|x) \over 1-P(y=1|x)}=e^{a+bx}}$

This then leads to the probability:

${\displaystyle P(y=1|x)={e^{a+bx} \over 1+e^{a+bx}}={1 \over 1+e^{-(a+bx)}}}$ [2]

This final equation is the logistic curve for Logistic regression. It models the non-linear relationship between x and y with an ‘S’-like curve for the probabilities that y =1 - that event the y occurs. In this example a and b represent the gradients for the logistic function just like in linear regression. The logit equation can then be expanded to handle multiple gradients. This gives more freedom with how the logistic curve matches the data. The multiplication of two vectors can then be used to model more gradient values and give the following equation:

${\displaystyle Logit(P(x))=w_{0}x^{0}+w_{1}x^{1}+w_{2}x^{2}+...+w_{n}x^{n}=w^{T}x}$

In this equation w = [ w0 , w1 , w2 , ... , wn ] and represents the n gradients for the equation. The powers of x are given by the vector x = [ 1 , x , x2 , .. , xn ] . These two vectors give the new logit equation with multiple gradients. The logistic equation then can then be changed to show this:

${\displaystyle P(y=1|x)={1 \over 1+e^{-(w^{T}x)}}}$

This is then a more general logistic equation allowing for more gradient values.