West Nile Virus in the US

This was an original research project I started as an undergraduate at BYU, for which the paper was published in the May 2012 issue of Applied Geography. With Dr. Ryan Jensen’s assistance I obtained the data from the CDC and performed spatial autocorrelation analysis of WNV cases in the United States.

Young, S.G. & Jensen, R.R. (2012) Statistical and visual analysis of human West Nile virus infection in the United States, 1999-2008. Applied Geography, 34, 425-431. doi:10.1016/j.apgeog.2012.01.008

This project also served as a  jumping off point for my master’s thesis research, which expanded to include remotely sensed environmental data and machine learning tools.


My master’s thesis “Landscape Epidemiology and Machine Learning: A Geospatial Approach to Modeling West Nile Virus Risk in the United States” won the 2014 Jacque May Thesis Prize from the AAG’s Health and Medical Geography specialty group and also resulted in a publication in Applied Geography in September 2013.

Young, S.G., Tullis, J.A., & Cothren, J. (2013) A remote sensing and GIS-assisted landscape epidemiology approach to West Nile virus. Applied Geography, 45, 241-249. doi:10.1016/j.apgeog.2013.09.022

The abstract for my thesis:

The complex interactions between human health and the physical
landscape and environment have been recognized, if not fully understood, since the ancient Greeks. Landscape epidemiology, sometimes called spatial epidemiology, is a sub-discipline of medical geography that uses environmental conditions as explanatory variables in the study of disease or other health phenomena. This theory suggests that pathogenic organisms (whether germs or larger vector and host species) are subject to environmental conditions that can be observed on the landscape, and by identifying where such organisms are likely to exist, areas at greatest risk of the disease can be derived. Machine learning is a sub-discipline of artificial intelligence that can be used to create predictive models from large and complex datasets. West Nile virus (WNV) is a relatively new infectious disease in the United States, and has a fairly well-understood transmission cycle that is believed to be highly dependent on environmental conditions. This study takes a geospatial approach to the study of WNV risk, using both landscape epidemiology and machine learning techniques. A combination of remotely sensed and in situ variables are used to predict WNV incidence with a correlation coefficient as high as 0.86. A novel method of mitigating the small numbers problem is also tested and ultimately discarded. Finally a consistent spatial pattern of model errors is identified, indicating the chosen variables are capable of predicting WNV disease risk across most of the United States, but are inadequate in the northern Great Plains region of the US.