
Slides & Tutorials
Lecture Slides (in Portuguese)
-
Introdução ao Aprendizado de Máquina
-
Regressões Lineares e Não-Lineares
-
Florestas Aleatórias (Random Forests)
-
Redes Neurais Artificiais (Artificial Neural Networks)
-
Máquinas de Vetores de Suporte (Support Vector Machines)
-
Validação de Modelos
R Tutorials (in Portuguese)
The ZIP file includes the following tutorials:
-
Tutorial 1. Introdução ao R
-
Tutorial 2. Regressões
-
Tutorial 3. Florestas Aleatórias
-
Tutorial 4.A Redes Neurais (com Torch)
-
Tutorial 4.B Redes Neurais (com Keras)
-
Tutorial 5. Support Vector Machines (SVM)
-
Tutorial 6. Validação de Modelos
Deforestation Data
This dataset was constructed to be part of the machine learning tutorials. It is focused on the 2004 deforestation in the Brazilian Amazon, measured at the municipality level, and related socioeconomic and biophysical factors. It includes 808 municipalities and 31 variables. The description of the variables is presented below:
ā
-
codigo : unique municipality ID
-
state : state acronym where the municipality is located
-
def_annMB : deforestation in 2004 based on the MapBiomas data (km2)
-
area_km2 : size of the municipality (km2)
-
PAs : municial area covered by protected areas (1000 km2)
-
env_fine_cancel : sum of environmental fines cancelled (R$)
-
env_fine : sum of environmental fines issued (R$)
-
env_fine_paid : sum of environmental fines issued (R$)
-
dist_ports : Euclidean distance to major ports
-
dist_manaus : Euclidean distance to the Manus port
-
dist_parana : Euclidean distance to the Paraná port
-
dist_arc : Euclidean distance from the Arc of Deforestation region
-
dist_capital : Euclidean distance to state capitals
-
dist_seat : Euclidean distance to municipal seats
-
incra_ha : INCRA settlement cover (ha)
-
incra_family : number of families in INCRA settlements
-
incra_cap : INCRA settlement capacity in terms of families
-
gdp : gross domestic product (R$)
-
gdp_agr : gross domestic product from the agricultural sector (R$)
-
pop : population count
-
suitability_soy : suitability index for soybean production (0 to 1)
-
suitability_pas : suitability index for pasture (0 to 1)
-
soil : soil quality (ranging from 1 - poor to 5 - excellent)
-
flo2000 : forest cover in 2000 (km2)
-
road_density : road density (km/km2)
-
road_hway_density : road density (km/km2)
-
road_hway_km : road density (km2)
-
road_km : road density (km2)
-
mayor_party : mayor party political alignment (left/center/right)
-
gov_party : governor party political alignment (left/center/right)
-
def_category : dummy indicating the top 25% municipalities with the highest deforestation in 2004 (=1) or not (=0)āāāāā
Some of my most used pieces of R code
For matching and copying a variable from dataframe d2 into dataframe d1:
ā
d1$variable <- d2[match(with(d1, ID), with(d2, ID)),]$variable
āāā
To compute a specific statistic per category (sum per year, in this example):
ā
library(plyr)
annual_sum <- ddply(.data = d1, .(year), .fun = summarise, total = sum(variable))