Into the palaeoverse
A community-driven R package
The long and the short of it 📏…
Palaeoverse is a project that aims to bring the palaeobiology community together.
palaeoverse provides auxiliary functions to support data preparation and exploration.
Improve code readability, reusability and reproducibility.
A whistle-stop tour of palaeoverse 🚋…
axis_geo
bin_lat
bin_time
data
group_apply
lat_bins
look_up
palaeorotate
phylo_check
tax_check
tax_expand_lat
tax_expand_time
tax_range_space
tax_range_time
tax_unique
time_bins
A lot of data, a lot of sources, and a lot of unique features.
Data structure, not source.
occdf
\(\rightarrow\) function(x)
\(\rightarrow\) df
Occurrence dataframe*
Let’s dive in 🤿…
The development version can be installed using devtools:
Two example occurrence datasets are available.
Carboniferous–Early Triassic tetrapods (n = 5270, Paleobiology Database).
# Get details on dataset
?tetrapods
# Load dataset
data("tetrapods")
# Available variables
colnames(tetrapods)
## [1] "occurrence_no" "collection_no" "identified_name"
## [4] "identified_rank" "accepted_name" "accepted_rank"
## [7] "early_interval" "late_interval" "max_ma"
## [10] "min_ma" "phylum" "class"
## [13] "order" "family" "genus"
## [16] "abund_value" "abund_unit" "lng"
## [19] "lat" "collection_name" "cc"
## [22] "formation" "stratgroup" "member"
## [25] "zone" "lithology1" "environment"
## [28] "pres_mode" "taxon_environment" "motility"
## [31] "life_habit" "diet"
Phanerozoic reef occurrences (n = 4363, PaleoReefs Database).
Two reference datasets are available.
Geological Time Scale 2012 & 2020 (Gradstein et al. 2012; 2020).
# Get details on dataset
?GTS2012
?GTS2020
# Load dataset
data("GTS2012")
data("GTS2020")
# Increase output width
options(width = 120)
# Print first few rows
head(GTS2012, n = 3)
## interval_number interval_name rank max_ma mid_ma min_ma duration_myr font colour abbr
## 1 1 Holocene stage 0.0117 0.0059 0.0000 0.0117 black #FDEDEC <NA>
## 2 2 Upper Pleistocene stage 0.1260 0.0688 0.0117 0.1143 black #FFF2D3 <NA>
## 3 3 Middle Pleistocene stage 0.7810 0.4535 0.1260 0.6550 black #FFF2C7 <NA>
head(GTS2020, n = 3)
## interval_number interval_name rank max_ma mid_ma min_ma duration_myr font colour abbr
## 1 1 Meghalayan stage 0.0042 0.00210 0.0000 0.0042 black #FDEDEC <NA>
## 2 2 Northgrippian stage 0.0082 0.00620 0.0042 0.0040 black #FDECE4 <NA>
## 3 3 Greenlandian stage 0.0117 0.00995 0.0082 0.0035 black #FEECDB <NA>
# Get first few rows
head(bins, n = 3)
## bin interval_name rank max_ma mid_ma min_ma duration_myr abbr colour font
## 1 1 Puercan North American Land Mammal Ages 66.00 65.375 64.75 1.25 P #FDB469 black
## 2 2 Torrejonian North American Land Mammal Ages 64.75 63.500 62.25 2.50 To #FEBA64 black
## 3 3 Tiffanian North American Land Mammal Ages 62.25 59.875 57.50 4.75 Ti #FEBF6A black
# Get first few rows
head(bins, n = 3)
## bin max_ma mid_ma min_ma duration_myr grouping_rank intervals colour font
## 1 1 541 535.00 529.0 12.0 stage Fortunian #80cdc1 black
## 2 2 529 521.50 514.0 15.0 stage Stage 3, Stage 2 #80cdc1 black
## 3 3 514 507.25 500.5 13.5 stage Drumian, Wuliuan, Stage 4 #80cdc1 black
Five temporal binning methods for age range data:
# Use tetrapod example data
occdf <- tetrapods
# Get stage-level time bins
bins <- time_bins(interval = "Phanerozoic", rank = "stage")
# Assign via midpoint age of fossil occurrence data
ex1 <- bin_time(occdf = occdf, bins = bins, method = "mid")
# Assign to all bins that age range covers
ex2 <- bin_time(occdf = occdf, bins = bins, method = "all")
# Assign via majority overlap based on fossil occurrence age range
ex3 <- bin_time(occdf = occdf, bins = bins, method = "majority")
# Randomly assign to overlapping bins based on fossil occurrence age range
ex4 <- bin_time(occdf = occdf, bins = bins, method = "random", reps = 10)
# Randomly assign point estimates (e.g. uniform distribution) based on fossil occurrence age range
ex5 <- bin_time(occdf = occdf, bins = bins, method = "point", reps = 10)
Generate and bin latitudinal data:
Generate and bin spatial data:
# Get reef data
occdf <- reefs[1:500, ]
# Bin data using a hexagonal equal-area grid
occdf <- bin_space(occdf = occdf, spacing = 250, return = TRUE)
# Plot world and grid using ggplot2
library(ggplot2)
library(rnaturalearth)
world <- ne_countries(scale = "small",returnclass = "sf")
ggplot() +
geom_sf(data = world, colour = "black", fill = "lightgrey") +
geom_sf(data = occdf$grid, fill = "orange", colour = "black") +
theme_void()
Palaeorotate fossil occurrences (multiple models available):
# Example with a few occurrences
occdf <- data.frame(lng = c(2, -103, -66),
lat = c(46, 35, -7),
age = c(88, 125, 200))
# Estimate palaeocoordinates using the GPlates API
ex1 <- palaeorotate(occdf = occdf, method = "point")
# Estimate palaeocoordinates using reconstruction files
ex2 <- palaeorotate(occdf = occdf, method = "grid")
# Estimate palaeocoordinates and uncertainty using reconstruction files
ex3 <- palaeorotate(occdf = occdf, method = "grid", uncertainty = TRUE)
# Increase output width
options(width = 400)
# Get first few rows
head(ex3)
## lng lat age rot_model rot_age rot_lng rot_lat p_lng p_lat
## 1 2 46 88 MERDITH2021 88 1.80 46.42 13.0134 37.6406
## 2 -103 35 125 MERDITH2021 127 -102.61 34.63 -41.8928 35.0437
## 3 -66 -7 200 MERDITH2021 200 -65.52 -6.95 -22.5209 -16.7714
Identify and count potential spelling variations of the same taxon:
# load occurrence data
data("tetrapods")
# Check taxon names alphabetically
ex1 <- tax_check(taxdf = tetrapods, name = "genus", dis = 0.05, verbose = FALSE)
# Get first few rows
head(ex1)
## group greater lesser count_greater count_lesser
## 1 D Dvinosaurus Dinosaurus 23 2
## 2 V Varanopus Varanops 5 3
In this example dataset:
Identifying unique taxa:
# Create dataframe
occdf <- data.frame(species = c("rex", "aegyptiacus", NA),
genus = c("Tyrannosaurus", "Spinosaurus", NA),
family = c("Tyrannosauridae", "Spinosauridae", "Diplodocidae"))
# Retain unique taxa
dinosaur_species <- tax_unique(occdf = occdf,
species = "species",
genus = "genus",
family = "family",
resolution = "species")
head(dinosaur_species)
## family genus genus_species unique_name
## 1 Spinosauridae Spinosaurus Spinosaurus aegyptiacus Spinosaurus aegyptiacus
## 2 Tyrannosauridae Tyrannosaurus Tyrannosaurus rex Tyrannosaurus rex
## 3 Diplodocidae <NA> <NA> Diplodocidae indet.
Calculate and plot temporal range of taxa:
Four approaches to calculate geographic range of taxa:
# Grab internal data
occdf <- tetrapods
# Remove NAs
occdf <- subset(occdf, !is.na(genus))
# Convex hull
ex1 <- tax_range_space(occdf = occdf, name = "genus", method = "con")
# Latitudinal range
ex2 <- tax_range_space(occdf = occdf, name = "genus", method = "lat")
# Great Circle Distance
ex3 <- tax_range_space(occdf = occdf, name = "genus", method = "gcd")
# Occupied grid cells
ex4 <- tax_range_space(occdf = occdf, name = "genus", method = "occ", spacing = 250)
# See first few rows
head(ex2)
## taxon taxon_id max_lat min_lat range_lat
## 1 Abajudon 1 -10.624 -16.524 5.9
## 2 Abdalodon 2 -31.925 -31.925 0.0
## 3 Abyssomedon 3 34.776 34.776 0.0
## 4 Acanthostomatops 4 51.000 51.000 0.0
## 5 Acerastea 5 -24.833 -24.833 0.0
## 6 Acerosodontosaurus 6 -24.000 -24.000 0.0
Convert range data to bin-level pseudo-occurrences:
# Generate example df
taxdf <- data.frame(name = c("A", "B", "C"),
max_age = c(150, 60, 30),
min_age = c(110, 20, 0))
# Generate pseudo-occurrences
ex1 <- tax_expand_time(taxdf = taxdf, max_ma = "max_age", min_ma = "min_age")
# Increase output width
options(width = 200)
# See first few rows
head(ex1)
## name max_age min_age ext orig interval_number interval_name rank max_ma mid_ma min_ma duration_myr font colour abbr
## 1 C 30 0 FALSE FALSE 1 Meghalayan stage 0.0042 0.00210 0.0000 0.0042 black #FDEDEC <NA>
## 2 C 30 0 FALSE FALSE 2 Northgrippian stage 0.0082 0.00620 0.0042 0.0040 black #FDECE4 <NA>
## 3 C 30 0 FALSE FALSE 3 Greenlandian stage 0.0117 0.00995 0.0082 0.0035 black #FEECDB <NA>
## 4 C 30 0 FALSE FALSE 4 Upper Pleistocene stage 0.1290 0.07035 0.0117 0.1173 black #FFF2D3 <NA>
## 5 C 30 0 FALSE FALSE 5 Chibanian stage 0.7740 0.45150 0.1290 0.6450 black #FFF2C7 <NA>
## 6 C 30 0 FALSE FALSE 6 Calabrian stage 1.8000 1.28700 0.7740 1.0260 black #FFF2BA <NA>
Convert range data to bin-level pseudo-occurrences:
# Generate latitudinal bins
bins <- lat_bins()
# Generate example df
taxdf <- data.frame(name = c("A", "B", "C"),
max_lat = c(60, 20, -10),
min_lat = c(20, -40, -60))
# Generate pseudo-occurrences
ex1 <- tax_expand_lat(taxdf = taxdf, bins = bins)
# See first few rows
head(ex1)
## name max_lat min_lat bin max mid min
## 1 A 60 20 4 60 55 50
## 2 A 60 20 5 50 45 40
## 3 A 60 20 6 40 35 30
## 4 A 60 20 7 30 25 20
## 5 B 20 -40 8 20 15 10
## 6 B 20 -40 9 10 5 0
Compare a list of taxonomic names to tip names in a user-provided phylogeny:
# Specify list of names
dinosaurs <- c("Nasutoceratops_titusi",
"Diabloceratops_eatoni",
"Zuniceratops_christopheri",
"Psittacosaurus_major")
# Table of taxon names in list, tree or both
ex1 <- phylo_check(tree = ceratopsianTreeRaia,
list = dinosaurs)
# Get first few rows
head(ex1)
## taxon_name present_in_tree present_in_list
## 8 Diabloceratops_eatoni TRUE TRUE
## 33 Psittacosaurus_major TRUE TRUE
## 38 Nasutoceratops_titusi FALSE TRUE
## 39 Zuniceratops_christopheri FALSE TRUE
## 1 Centrosaurus_apertus TRUE FALSE
## 2 Styracosaurus_albertensis TRUE FALSE
Link and match interval names to the Geological Time Scale:
## Link numeric age values
# Create exemplary df
occdf <- data.frame(name = c("A", "B", "C"),
early_interval = c("Maastrichtian",
"Campanian",
"Sinemurian"),
late_interval = c("Maastrichtian",
"Campanian",
"Bartonian"))
# Assign stages and numerical ages
occdf <- look_up(occdf)
## Use exemplary int_key
# Get internal reef data
occdf <- reefs
# Get internal interval key
int_key <- interval_key
# Assign stages and numerical ages
occdf <- look_up(occdf,
early_interval = "interval",
late_interval = "interval",
int_key = int_key)
Add Geological Time Scale to plots:
Run functions over groups of data:
# Get tetrapod data
occdf <- tetrapods
# Count number of occurrences from each country
ex1 <- group_apply(occdf = occdf, group = "cc", fun = nrow)
# Remove NA data
occdf <- subset(occdf, !is.na(genus))
# Unique genera per collection with group_apply and input arguments
ex2 <- group_apply(occdf = occdf,
group = c("collection_no"),
fun = tax_unique,
genus = "genus",
family = "family",
order = "order",
class = "class",
resolution = "genus")
# Use multiple variables (number of occurrences per collection & formation)
ex3 <- group_apply(occdf = occdf,
group = c("collection_no", "formation"),
fun = nrow)
Onwards and upwards 🏔️…