Part A: Latitude vs. Daylight


In this part, you will examine data collected in the Global Sun - Temperature project to see if there is any relationship between proximity to the equator (latitude) and amount of daylight at that location. The data that you will be using was collected during a specific week in November, 2000. During that week, students recorded the sunrise and sunset times and determined the number of minutes of daylight (or sunlight) each day. They found the average minutes of daylight for the week and submitted it to the project along with their latitude and longitude.

1.  Review the data that was submitted to the project: Fall 2000 Data . Note that it is sorted by latitude. What do negative latitudes represent? Positive latitudes?


2. Examine a scatter plot of the data. Describe, in your own words, the trend you see in the data. Does it make sense? Explain why or why not.

(click to enlarge)


3. What is the appropriate domain and range for this data? Explain your reasoning.


4. Below are three functions that have been fit to the data; a linear function, a 2nd degree polynomial (quadratic) and a 3rd degree polynomial. The equations for each of the functions are given along with the Coefficient of Determination (r2). In each case, the functions have been extrapolated to +/- 90.

Linear Model

(click to enlarge)

2nd Degree Polynomial Model

(click to enlarge)

3rd Degree Polynomial Model

(click to enlarge)


QUESTION: Which one is the most appropriate model for this data?

Things to consider: Explanation for above models:
Coefficient of Determination (r2)

The coefficient of determination indicates the percent of variation in the data that is explained by the model. In other words, it tells how good a fit the model is to the data. The closer r2 is to 1, the better the fitted model explains the data. However, it is not the only measure of a good model.


The coefficient of determination for all three of the models above is very strong and doesn't vary much among the models. In this case, the coefficient of determination may not be the best way to select the most appropriate model.

Residuals (or statistical errors) help to determine if the model is a good fit to the data. A scatter plot of residuals vs. the independent variable that shows residuals uniformly close to the x-axis indicates a good fit to the data. However, if the residuals form any type of increasing or decreasing pattern then it may indicate that the model is not a good fit to the data.

Residual = observed value - predicted value

The residuals for each of the models is shown below.

Residuals for Linear Model

(click to enlarge)

Residuals for 2nd Degree Polynomial

(click to enlarge)

Residuals for 3rd Degree Polynomial

(click to enlarge)

The residuals for the linear model look slightly more uniform about the x-axis than the other two models.
Which Model Supports the Theory the Best?

Examine each of the models. Do any of the models seem inappropriate for the domain and range of data? Do any seem inappropriate when the data is extrapolated? Do any data seem to be outliers? Might there be a reasonable explanation for the outliers? Which model provides a reasonable indication of what you would expect to see?

The 2nd degree polynomial model is a parabola. Although the curve fits the data for the domain and range of data given, it also indicates that the amount of daylight reaches a maximum at southern latitudes and then decreases at more southern latitudes. This model does not appear to fit what we know to be true. Both the linear and 3rd degree polynomial model indicate that the amount of daylight experienced increases with northern latitudes, which is what is expected.
All things being equal, which is the simplest model that explains the trend in data?

Ockham's Razor is a maxim that suggests choosing a simple model over a more sophisticated model if all else appears equal. You can find out more about Ockham's Razor in the Supplementary Resources section.

The linear model in this example is the simplest of the models given.


5. Which model do YOU think is best for the given data? Justify your decision.


6. What would you expect the model to look like in May of any given year? Explain. Explore archived data for this project and pick 5-10 data points that either support or refute your predicted model. Pick points at a variety of latitudes to ensure that the points represent the model in its entirety.




Copyright 2005 Stevens Institute of Technology
Center for Innovation in Engineering and Science Education, All Rights Reserved