How to Find the Equation of a Scatter Plot

By Samuel Markings; Updated April 25, 2017
Mathematical relationships between two variables can be obtained from scatter plots.

A scatter plot is a graph that shows the relationship between two sets of data. Sometimes it is helpful to use the data contained within a scatter plot to obtain a mathematical relationship between two variables. The equation of a scatter plot can be obtained by hand, using either of two main ways: a graphical technique or a technique called linear regression.

Creating a Scatter Plot

Use graph paper to create a scatter plot. Draw the x- and y- axes, ensure they intersect and label the origin. Ensure that the x- and y- axes also have correct titles. Next, plot each data point within the graph. Any trends between the plotted data sets should now be evident.

Line of Best Fit

Once a scatter plot has been created, assuming there is a linear correlation between two data sets, we can use a graphical method to obtain the equation. Take a ruler and draw a line as close as possible to all of the points. Try to ensure that there are as many points above the line as there are below the line. Once the line has been drawn, use standard methods to find the equation of the straight line

Equation of Straight Line

Once a line of best fit has been placed upon a scatter graph it is straightforward to find the equation. The general equation of a straight line is:

y = mx + c

Where m is the slope (gradient) of the line and c is the y-intercept. To obtain the gradient, find two points upon the line. For the sake of this example, let's assume that the two points are (1,3) and (0,1). The gradient can be calculated by taking the difference in the y-coordinates and dividing by the difference in the x-coordinates:

m = (3 - 1 ) / ( 1 - 0 ) = 2 / 1 = 2

The gradient in this case is equal to 2. Thus far, the equation of the straight line is

y = 2x + c

The value for c can be obtained by substituting in the values for a known point. Following the example, one of the known points is (1,3). Plug this into the equation and rearrange for c:

3 = (2 * 1) + c

c = 3 - 2 = 1

The final equation in this case is:

y = 2x + 1

Linear Regression

Linear regression is a mathematical method that can be used to obtain the straight-line equation of a scatter plot. Start by placing your data into a table. For this example, let us assume that we have the following data:

(4.1, 2.2) (6.5, 4.5) (12.6, 10.4)

Calculate the sum of the x-values:

x_sum = 4.1 + 6.5 + 12.6 = 23.2

Next, calculate the sum of the y-values:

y_sum = 2.2 + 4.4 + 10.4 = 17

Now sum the products of each data-point set:

xy_sum = (4.1 * 2.2 ) + (6.5 * 4.4 ) + (12.6 * 10.4) = 168.66

Next, calculate the sum of the x-values squared and the y-values squared:

x_square_sum = (4.1^2) + (6.5^2) + (12.6^2) = 217.82

y_square_sum = (2.2^2) + (4.5^2) + (10.4^2) = 133.25

Finally, count the number of data points you have. In this case we have three data points (N=3). The gradient for the best-fit line can be obtained from:

m = (N * xy_sum) - (x_sum * y_sum) / (N * x_square_sum) - (x_sum * x_sum) = (3 * 168.66) - (23.2 * 17) / (3 * 217.82) - (23.2 * 23.2) = 0.968

The intercept for the best-fit line can be obtained from :

c = (x_square_sum * y_sum ) - (x_sum * xy_sum) / (N * x_square_sum) - (x_sum * x_sum)

\= (217.82 17) - (23.2 168.66) / (3 * 217.82) - (23.2 * 23.2) \= -1.82

The final equation is therefore:

y = 0.968x - 1.82

About the Author

Samuel Markings has been writing for scientific publications for more than 10 years, and has published articles in journals such as "Nature." He is an expert in solid-state physics, and during the day is a researcher at a Russell Group U.K. university.