# Reach vs Budget: Polynomial Regression on Facebook Post Data

**Polynomial Regression on Facebook Post Data**

Applying polynomial regression to get insights between Facebook Post Reach and Amount Spent on the Post

##### Introduction

Facebook has been a hot market to promote businesses or public figures for a while now and paid promotions are the easiest way to get a high amount of views for your post in a short amount of time. This leads us to the more important question as to how much money should you invest per post to get the optimum number of views. Naturally, the more you invest the longer time you invest it for, your number of views will increase proportionally until a point where it will start saturating. This is the general trend.

It must be noted that the amount of users who come in contact with your post may or may not be proportional to the number of people who get converted. The latter depends on setting proper targeting parameters which are set while configuring your Facebook Ad.

##### Objective

The objective of this task was to plot the Post Reach vs the Amount spent on promoting that post on Facebook and to obtain a realistic amount of Reach which can be obtained on spending the specified Amount. This was carried out in Octave to take advantage of the prebuilt optimisation functions.

##### Terms :

*Reach: *The number of unique people who have seen your content within a certain period

*Amount: *The amount of money paid to facebook which is to be evenly spread out for paid promotions across a certain period

##### Data Set

Out set comprised of 109 Post examples with amounts spent ranging from 2 USD to 139 USD. The data set is plotted as below.

##### Ideal Result

We want to achive a range of values of Facebook Reach which corresponds to a certain amount spent on a Facebook Post. It should look something like as below:

##### Obtaining Required Curves

We can use a linear line *y = θx +c* to define both upper and lower limit curves but that will not result in an very accurate model as our data is not linear in nature. Therefore we use a polynomial equation to get a closer result. The equation which we have used is :

y=θ0+θ1x+θ2x−−√+θ3×2.

Here,

00, 01, 02, 03 are our learning parameters whose values we have to find out for both the upper and the lower limit curves..

##### Normalisation of Data

We need to normalise our data to make sure our learning parameters don’t blow up. This will also make it easier to handle data in general. Normalisation of data means that we contrain our data values in a certain region. Here, we have chosen our region as 0≤X≤1. Therefore, we have to map all values of Facebook Reach and all values of Amount Spent between 0 and 1.

We use a min-max formula for normalsing our values (both Facebook Reach and Amount Spent) where :

x_{(i)n} = {x_{(i)} – x_{min} \over x_{max} – x_{min}}.

normalised value of} x_{(i)}

x_{max} \text{is the maxmimum value of} x_{(i)}

x_{min} \text{is the minimum value of} x_{(i)}

On normalising, we get the following data

##### Splitting of Data

Initially we had a single curve which was predicting the values but later we realised it was more accurate to give a range as post reach depends on a multitude of factors like (duration of post, groups targeted, locations etc). To get an upper and lower limit, we split our data set across the linear line *x=y*.

##### Minimising Cost Function

* θ_{0} , θ_{1} , θ_{2} , θ_{3}* can have a multitude of values to we need to find the values for the same which are best suited for our data set. This is better explained through the example below.

Assume the data points plotted below as our data ste and we need to find which of the three lines most accurately depict/suit our data set. All the three lines can be plotted by * y = mX + c;*, where

**and**

*m***are our parameters.**

*c*Visually we can see that the green line is the closest to our data points. To obatin this mathematically we obtain the *cost* of the line corresponding to each point and choose the line which has the least cost for all our points. For simplicity let us consider the cost as the distance between the line and the point. We then compute the sum of this distance for all point for each line. The line which corresponds to the least distance is chosen as the most suited line. Thus we tune our parameters to obatin the most cost effective line.

##### Tuning our parameters

We have to change the values of * θ_{0} , θ_{1} , θ_{2} , θ_{3}* so our cost function is minimised. We used an advanced optimization function called

*fminunc()*to get optimal parameters.

##### Plotting Polynomial Regression Curves

After obtaining our parameters we plot them against our original data set to make sure they fit it properly.

##### Understanding the result

As shown in the graph above, we can see an approximate upper and lower limit for the Reach which your Post can achieve. This result is more accurate for amounts less than 60 USD as most of our data set resides in those values.

Both Lines can be plotted with the equation: *Y = θ _{1} + θ_{2}X + θ_{3}X^{0.5} + θ_{4}X^{2}*

The subsequent values for θ_{i} are:

# | Upper Line | Lower Line |

θ_{1} | -0.010826 | -0.035043 |

θ_{2} | 0.317787 | -0.125105 |

θ_{3} | 0.467136 | 0.342039 |

θ_{4} | 1.170555 | 0.657346 |

##### Result

Based on the values which were in our possession, we can surmise our findings using the following formulas: If you spend X amount then the Reach of the Post can be given by:

(-0.035043 -0.125105*P + 0.342039*P^{0.5} + 0.657346*P^{2}) * 6010 + 5 ≤ Reach ≤ (-0.010826 0.317787*P + 0.467136*P^{0.5} + 1.170555*P^{2}) * 6010 + 5

Where P = (X-0.02)/138.93

You can use our handy calculator to get a better idea of the output

5

On investing 5 USD your Reach will be between 162-549 people.