The objectives of the data analysis project are:
- To compare data from air pollution from XVPCA server and meteocat XEMA server (e.g. January 2019 vs January 2020) in Barcelona to discover if there is any statistical difference among air pollution data. Air pollution is supposed to be improving lately due to old car restrictions created by Barcelona local regulations.
- To show data from 1999 to 2020 with powerful graphs using openair R library with an excellent manual and general purpose exploratory data analysis (e.g. ExPanDaR R library, ggplot2 library)
- To combine and organize air pollution data from meteocat and XVPCA using different R libraries (e.g. dplyr)
- To analyse normality using R normality test and depending on the results choose the corresponding test to show possible statistical differences between cities, seasons, weekdays, weekends, years, etc
- To analyse all the values above the limits approved by the European Union, limit values by World Heath Organisation, etc. As an example download the pdf report of a PM10 episode in January 2020 at the very end of this webpage.
- To forecast air pollution using deep learning and artificial intelligence algorithms in R in order to find out which one is the best one regarding prediction capabilities.
- To create a real-time app by using Socrata Open Data API to access air pollution data from Government of Catalonia. An example here by student Zakaria Sadiki), example using weather.io by student David Contreras, example using OpenWeatherMap data by student David Díaz.
Tools to be used:
To analyze data you need a pendrive to download the following software R CRAN and R Studio zip file in your pendrive. First execute R CRAN and install it in your pendrive and after unzip RStudio also in you pendrive. To end the installation please execute R Studio by clicking on rstudio.exe in the bin folder (create a shortcut in your pendrive to work better) and the first time you need to link it to R CRAN as it is shown in the next images:
AIR POLLUTION ANALYSIS USING R SOFTWARE
It is very important to use correct comma separated values (csv) files. Remember to save as csv in LibreOffice Calc or Microsoft Office Excel and to replace all commas to points, then all semicolons to commas and finally all Sense dades to NA. After all that you can use your data in R Studio using the instructions found in the table below.
To order messy data you need to use tidyr library in order to reshape all the table contents. Please use e.g. gather and unite functions from tidyr library from the tidyr cheatsheet and the lubridate cheatsheet to work with dates. The objective of these data manipulations is to have something similar to the next csv (save a copy with write.csv function).
|Calculate quartiles, median, min, max, outliers||summary (martorell)|
|Create histogram||hist(martorell$NO, breaks=20, xlab=”NO(micrograms/m3)”, main=”Martorell air pollution”, col= “pink”)|
|Install and load normality test||install.packages(” nortest “)
|Shapiro- Wilk normality test
Anderson-Darling normality test
Cramer-von-Mises normality test
Kolmogorov-Smirnov normality test
Pearson normality test
Shapiro-Francia normality test
|library (e1071)||skewness(x), kurtosis(x)|
|skewness (negative:left tail, positive: right tail, normal:0)||skewness(martorell$NO)|
|kurtosis (negative : platycurtic, positive : leptocurtic, normal :0)||kurtosis(martorell$NO)|
|Student t test (compare 2 normal data)||t.test(martorell$NO, santandreu$NO)|
|U Mann Whitney (compare 2 non normal data)||wilcox.test(martorell$NO, santandreu$NO)|
|Compare > 2 normal groups||ANOVA test|
|Compare > 2 non-normal grous||Kruskal-Wallis test|
|Homocedascity (equal variability)||leveneTest(x) in car library|
|View possible correlation||pairs (martorell)|
|Object type : class (x)||numeric, date, time series, dataframe, etc|
|Convert numeric to date||martorell$date<-as.Date(martorell$date)|
|Create time series||martorellNO.ts <- ts(martorell$NO, start=c(2015, 1, 1), end=c(2015, 12,31), frequency=365)|
|Bind column data||NOcompare<-cbind(martorellNO.ts,santandreuNO.ts)|
|Plotting multiple time series||plot(NOcompare, plot.type=”m”,col=c(“blue”, “red”))|
|Subset time series||mytssummer2015 <- window(myts, start=c(2015, 6,21), end=c(2015, 9,22))|
|Correlation plot||library (corrplot)
Regarding NO2 levels found in previous image:
Is Martorell following the NO2 air pollution standards of the EU?
Analyse all EU air standards and compare to WHO air standards
> plot(martorell$date, martorell$NO, type = "l", xlab = "year 2015",ylab = "Nitric oxide (microg/m3)")
> plot(martorell$date, martorell$NO, type = “l”, xlab = “year 2015”,ylab = “Nitric oxide (microg/m3)”, main=”Air pollution in Martorell (8760 observations)”)
> plot(martorell$date[1:168], martorell$NO[1:168], type = “l”, xlab = “year 2015”,ylab = “Nitric oxide (microg/m3)”, main=”Air pollution in Martorell”)
> plot(martorell$date[144:168], martorell$NO[144:168], type = “l”, xlab = “year 2015”,ylab = “Nitric oxide (microg/m3)”, main=”Air pollution in Martorell (7th January)”)
Time series air pollution comparison Martorell vs Sant Andreu de la Barca (2015)
T- test (normal data) or U Mann- Withney (non-normal data)
Air pollutant in Martorell (2015)
One week forecast of air pollution in Martorell
Download the manual of Open air library for R in order to analyse air pollution data