The statistical method undertaken in analysing the effect of temperature on P. xanthii germination and germ-tube production.
Load R libraries required for analysis
library(data.table) # data structure, cleaning and formatting
library(mgcv) # gam statistical models
library(rstanarm) # Bayesian statistical models
library(ggplot2) # graphing
library(MASS) # logistical ordinal regression functions, among other things
library(flextable) # formatting data.frames as word tables
library(marginaleffects) # produce predicted probabilities for polr()
source("R/p_star.R") # function to show visually the level of significance
Read in data
g_tm <- fread("cache/germination_temperature.csv")
Peek at the data
head(g_tm)
Tm i_period leaf germ_conidia non_germ_conidia
1: 8 12 1 0 100
2: 8 12 1 0 100
3: 8 12 2 0 100
4: 8 12 2 0 100
5: 8 24 1 0 100
6: 8 24 1 0 100
str(g_tm)
Classes 'data.table' and 'data.frame': 128 obs. of 5 variables:
$ Tm : int 8 8 8 8 8 8 8 8 8 8 ...
$ i_period : int 12 12 12 12 24 24 24 24 36 36 ...
$ leaf : int 1 1 2 2 1 1 2 2 1 1 ...
$ germ_conidia : int 0 0 0 0 0 0 0 0 0 0 ...
$ non_germ_conidia: int 100 100 100 100 100 100 100 100 100 100 ...
- attr(*, ".internal.selfref")=<externalptr>
Summary table of the data.
g_tm_table <-
g_tm[, list(Mean = mean(germ_conidia),
Std_dev = round(sd(germ_conidia),3),
Median = median(germ_conidia),
Range = paste0(range(germ_conidia), collapse = " - " ),
Replicates = .N),
by = .(Tm, i_period)] |>
flextable() |>
align(align = "center") |>
set_header_labels(Tm = "Temperature",
i_period = "Incubation (hours)",
Std_dev = "Standard deviation") |>
fontsize(size = 8, part = "body") |>
fontsize(size = 10, part = "header") |>
bg(i = seq(2,32,2),bg = "grey80",part = "body")|>
set_caption("Podosheria xanthii germination percentage over a range of temperatures and incubation periods")
g_tm_table
Temperature | Incubation (hours) | Mean | Standard deviation | Median | Range | Replicates |
8 | 12 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
8 | 24 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
8 | 36 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
8 | 48 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
17 | 12 | 7.75 | 2.217 | 8.0 | 5 - 10 | 4 |
17 | 24 | 15.00 | 1.826 | 15.0 | 13 - 17 | 4 |
17 | 36 | 21.50 | 3.416 | 22.0 | 17 - 25 | 4 |
17 | 48 | 24.75 | 3.304 | 25.0 | 21 - 28 | 4 |
19 | 12 | 10.50 | 1.291 | 10.5 | 9 - 12 | 4 |
19 | 24 | 22.25 | 2.217 | 22.0 | 20 - 25 | 4 |
19 | 36 | 31.50 | 2.887 | 31.5 | 28 - 35 | 4 |
19 | 48 | 32.25 | 2.363 | 33.0 | 29 - 34 | 4 |
22 | 12 | 25.50 | 1.732 | 25.0 | 24 - 28 | 4 |
22 | 24 | 48.00 | 2.160 | 48.5 | 45 - 50 | 4 |
22 | 36 | 70.50 | 1.291 | 70.5 | 69 - 72 | 4 |
22 | 48 | 74.75 | 4.113 | 74.5 | 70 - 80 | 4 |
25 | 12 | 30.50 | 1.291 | 30.5 | 29 - 32 | 4 |
25 | 24 | 51.75 | 1.708 | 51.5 | 50 - 54 | 4 |
25 | 36 | 73.50 | 2.646 | 74.0 | 70 - 76 | 4 |
25 | 48 | 79.75 | 1.708 | 79.5 | 78 - 82 | 4 |
28 | 12 | 53.00 | 2.582 | 53.0 | 50 - 56 | 4 |
28 | 24 | 63.00 | 2.160 | 63.5 | 60 - 65 | 4 |
28 | 36 | 79.75 | 2.500 | 79.5 | 77 - 83 | 4 |
28 | 48 | 85.25 | 2.500 | 85.5 | 82 - 88 | 4 |
31 | 12 | 10.50 | 1.291 | 10.5 | 9 - 12 | 4 |
31 | 24 | 16.00 | 2.160 | 15.5 | 14 - 19 | 4 |
31 | 36 | 18.25 | 1.258 | 18.0 | 17 - 20 | 4 |
31 | 48 | 20.00 | 2.160 | 20.5 | 17 - 22 | 4 |
35 | 12 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
35 | 24 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
35 | 36 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
35 | 48 | 0.00 | 0.000 | 0.0 | 0 - 0 | 4 |
How many leaves were inoculated?
length(unique(g_tm$Tm)) * # Temperature incubators
length(unique(g_tm$leaf)) * # individual leaves
length(unique(g_tm$i_period)) # Sampling times
[1] 64
Let’s plot the data over time, we will need to omit temperatures 8 and 35 where no germination was observed.
g_tm[Tm != 8 &
Tm != 35,] |>
ggplot(aes(x = i_period, y = germ_conidia, colour = as.factor(Tm))) +
geom_point()+
geom_smooth(method = "gam",formula = y ~ s(x, k = 3))+
scale_color_viridis_d()+
xlab("Time in hours")+
ylab("Germinated conidia (%)")+
labs(colour = "Temperature")+
theme_classic()+
scale_x_continuous(limits = c(12,48), breaks = c(12,18,24,30,36,42,48))+
scale_y_continuous(limits = c(0,100), n.breaks = 5)
Now lets plot germination over temperature separating by infection
period.
g_tm |>
ggplot(aes(x = Tm, y = germ_conidia, colour = as.factor(i_period))) +
geom_point()+
geom_smooth(method = "gam",formula = y ~ s(x, k = 8), alpha = 0.2)+
scale_color_viridis_d(direction = -1)+
xlab(expression("Temperature " ( degree*C)))+
ylab("Germinated conidia (%)")+
labs(colour = "Time (h)")+
theme_classic()+
scale_x_continuous(breaks = seq(5,35,5))+
scale_y_continuous(limits = c(0,100), n.breaks = 5)+
annotate("text", x = 8, y = 100, label = "(a)")
This plot shows that the optimum temperature for rapid P. xanthii germination is around 28\(^\circ\)C. Also, in the first plot, if we assume at time zero there was no germinated conidia, 50% of the conidia germinated within the first 12 hours. In all temperature treatments, between 12 and 36 hours after infection conidial germination increased at a linear rate. After 36 hours the rate of conidia germinating declined to almost zero. Curves were fit using a generalised additive model with a smooth spline containing 6 knots.
Lets further evaluate this fit. However instead of fitting a temperature as a separate random effect, as depicted in the above plot, we will define temperature as a separate continuous predictor that may or may not interact with infection period.
Germination is a binomial response so we need to provide the number of germinated to the number of non-germinated. In the first model we will fit a smooth spline to each predictor and specify that each predictor influences germination independently of the other.
I have used only 2 and 4 basis spline ‘knots’ for infection period and temperature respectively.
m1 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4),
data = g_tm,
family = "binomial")
summary(m1)
Family: binomial
Link function: logit
Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.73642 0.04417 -39.31 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(i_period) 1.974 1.999 633.4 <2e-16 ***
s(Tm) 2.996 3.000 2114.0 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.93 Deviance explained = 93.2%
UBRE = 2.0049 Scale est. = 1 n = 128
plot(m1)
vis.gam(m1,
theta = -50, # horizontal rotation
phi = 30, # vertical rotation
n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
)
Model two we will investigate whether there is an interaction between
temperature and infection period using a tensor.
m2 <- gam(cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4),
data = g_tm,
family = "binomial")
summary(m2)
Family: binomial
Link function: logit
Formula:
cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.70973 0.04499 -38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
te(i_period,Tm) 10.51 11.61 2424 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.954 Deviance explained = 95.4%
UBRE = 1.1466 Scale est. = 1 n = 128
plot(m2)
vis.gam(m2,
theta = -50, # horizontal rotation
phi = 30, # vertical rotation
n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
)
This model shows slightly higher R squared and ‘explained deviance’
values, with a lower UBRE value indicating that model two
(m2
) is the better model.
Lets compare the models against each other.
anova(m1,m2, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 122.00 372.68
2 115.39 251.74 6.6071 120.94 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This confirms that model two is the better model and explains more of the deviance.
Yet lets see if we can fit a better model to represent the raw data does not use an interaction to be sure this is the best fit. The plots of the fitted model indicate that 28\(^\circ\)C showed the highest conidial germination. This might be due to the default type of basis splines, and number of knots. We have exhausted our options for improving the interaction, so let us investigate improving the additive model by fitting a better spline to the temperature predictor.
P-splines use a penalised least-square method to fit the model and are better in non-normal distributions. Lets try p-splines to see if we can improve the fit.
m3 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4, bs = "ps"),
data = g_tm,
family = "binomial")
summary(m3)
Family: binomial
Link function: logit
Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "ps")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.94630 0.05067 -38.41 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(i_period) 1.973 1.999 630.6 <2e-16 ***
s(Tm) 2.997 3.000 1896.7 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.933 Deviance explained = 93.7%
UBRE = 1.7889 Scale est. = 1 n = 128
plot(m3)
vis.gam(m3,
theta = -50, # horizontal rotation
phi = 30, # vertical rotation
n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
)
The P-splines have not improved the predictive ability of the model with a lower UBRE compared with model 2, improved the R squared value and explained deviance. m3 however is a better model compared to m1 , We can also see that the curve has been shifted slightly up towards 28\(^\circ\)C.
Lets compare the models two and three to against each other.
anova(m2,m3, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "ps")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 115.39 251.74
2 122.00 345.03 -6.6071 -93.294 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The models show no significant difference therefore we should use the
simpler model, model three (m3
). Model three also shows a
better fit to the data.
Next lets see if cyclic versions of the p-splines "cp"
improve this fit further.
m4 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =4, bs = "cp"),
data = g_tm,
family = "binomial")
summary(m4)
Family: binomial
Link function: logit
Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "cp")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7148 0.0412 -41.62 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(i_period) 1.975 1.999 639 <2e-16 ***
s(Tm) 2.997 3.000 2204 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.945 Deviance explained = 94.6%
UBRE = 1.4262 Scale est. = 1 n = 128
plot(m4)
vis.gam(m4,
theta = -50, # horizontal rotation
phi = 30, # vertical rotation
n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
)
While not an improvement on m2
, the cyclic P-splines
seem to have a very good predictive ability with a very low UBRE, also a
high R squared value and explaining a large amount of deviance. We can
also see that the curve has been shifted further towards 28\(^\circ\)C.
Lets compare models three and four.
anova(m2,m4, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "cp")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 115.39 251.74
2 122.00 298.62 -6.6071 -46.875 3.886e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m3,m4, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "ps")
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "cp")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 122 345.03
2 122 298.61 8.0648e-05 46.419 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Lets use the p splines again however increase the number of knots for temperature in attempt to get a better fit to the raw data.
m5 <- gam(cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k=3) + s(Tm, k =6, bs = "cs"),
data = g_tm,
family = "binomial")
summary(m5)
Family: binomial
Link function: logit
Formula:
cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 6, bs = "cs")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.7266 0.1014 -26.88 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(i_period) 1.974 1.999 637.3 <2e-16 ***
s(Tm) 4.992 5.000 1794.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.973 Deviance explained = 97.3%
UBRE = 0.27067 Scale est. = 1 n = 128
plot(m5)
vis.gam(m5,
theta = -50, # horizontal rotation
phi = 30, # vertical rotation
n.grid = max(g_tm$Tm) - min(g_tm$Tm) # grid cells
)
This model plot looks contrived, however if we focus the plot and ignore large negative standardised values created by zero observations at 8 and 35\(^\circ\), the curve seems to reflect the raw data very well.
This model also shows the highest R-squared, explained more deviance and has the lowest UBRE score, compared to all other models.
Lets compare the models two and five, then four and five to against each other.
anova(m2,m5, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ te(i_period, Tm, k = 4)
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 6, bs = "cs")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 115.39 251.74
2 120.00 146.71 -4.6071 105.03
anova(m4,m5, test = "Chisq")
Analysis of Deviance Table
Model 1: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 4, bs = "cp")
Model 2: cbind(germ_conidia, non_germ_conidia) ~ s(i_period, k = 3) +
s(Tm, k = 6, bs = "cs")
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 122 298.62
2 120 146.71 1.9999 151.9 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The cyclic p-spline model with more knots explained the most deviance and provides a significantly better fit.
And look at the AIC values of all the models.
AIC(m1,m2,m3,m4,m5)
df AIC
m1 5.970342 832.0036
m2 11.510000 722.1389
m3 5.969895 804.3527
m4 5.971520 757.9365
m5 7.965804 610.0241
This shows that model five is the best fit, with the lowest AIC and an R squared of 0.9732. Temperature and infection period are significant predictors and the model produces a low UBRE score or 0.271, indicating good predictive ability and less affected by drop one, tests.
To simulate data from the model using predict
we need to
provide some values which to estimate over.
# Make a data.frame of test values to feed in the model
test_dat <- data.table(Tm = rep(seq(from = 0, to = 40, by = 0.1), 4), # between 5 and 8 degrees for each incubation period
i_period = rep(c(12,24,36,48), each = length(seq(from = 0, to = 40, by = 0.1))))
# create a data.frame with predicted values and reference them to the respective temperature
m5_pred_list <-
predict(m5,
test_dat,
type = "response",
se.fit = TRUE)
m5_pred <- data.table(Tm = test_dat$Tm,
i_period = as.factor(test_dat$i_period),
germinated = m5_pred_list$fit,
germ_se = m5_pred_list$se.fit,
CI_low = m5_pred_list$fit + (m5_pred_list$se.fit * -1.94),
CI_high = m5_pred_list$fit + (m5_pred_list$se.fit * 1.94))
Figure_2a <-
m5_pred |>
ggplot(aes(x = Tm, y = germinated, colour = i_period))+
geom_line(size = 1)+
scale_color_grey(start = 0.75,end = 0)+
geom_point(data = g_tm, aes(x = Tm, y = (germ_conidia/100), colour = as.factor(i_period)))+
theme_bw()+
geom_line(aes(x = Tm, y = CI_low), linetype = "dashed", size = 0.6)+
geom_line(aes(x = Tm, y = CI_high), linetype = "dashed", size = 0.6)+
xlab(expression("Temperature " ( degree*C)))+
ylab("Germinated conidia (%)")+
labs(colour = "Time (h)")
Figure_2a
Import cleaned data.
br_tm <- fread("cache/branching_temperature.csv")
Peek at the data
head(br_tm)
Tm germtubes rep conidia
1: 8 0 1 100
2: 8 0 2 100
3: 8 1 1 0
4: 8 1 2 0
5: 8 2 1 0
6: 8 2 2 0
str(br_tm)
Classes 'data.table' and 'data.frame': 64 obs. of 4 variables:
$ Tm : int 8 8 8 8 8 8 8 8 17 17 ...
$ germtubes: int 0 0 1 1 2 2 3 3 0 0 ...
$ rep : int 1 2 1 2 1 2 1 2 1 2 ...
$ conidia : int 100 100 0 0 0 0 0 0 70 70 ...
- attr(*, ".internal.selfref")=<externalptr>
summary(br_tm)
Tm germtubes rep conidia
Min. : 8.00 Min. :0.00 Min. :1.0 Min. : 0.0
1st Qu.:18.50 1st Qu.:0.75 1st Qu.:1.0 1st Qu.: 0.0
Median :23.50 Median :1.50 Median :1.5 Median : 15.0
Mean :23.12 Mean :1.50 Mean :1.5 Mean : 25.0
3rd Qu.:28.75 3rd Qu.:2.25 3rd Qu.:2.0 3rd Qu.: 37.5
Max. :35.00 Max. :3.00 Max. :2.0 Max. :100.0
This experiment used the same incubation conditions as was used in the germination experiment. Following 36 hours of incubation at the respective temperatures, four leaf discs were removed from the growth chamber stained and fixed for microscopy using the leaf clearing method. Then one hundred conidia were selected at random to record the state of germination or germ tube production.
Summary table of the data.
br_tm[, list(Mean = mean(conidia),
Range = paste0(range(conidia), collapse = " - " ),
Std_dev = round(sd(conidia),3),
Replicates = .N),
by = .(Tm, germtubes)] |>
flextable() |>
align(align = "center") |>
set_header_labels(Tm = "Temperature",
Std_dev = "Standard deviation") |>
fontsize(size = 8, part = "body") |>
fontsize(size = 10, part = "header") |>
bg(i = seq(2,32,2),bg = "grey80",part = "body")|>
set_caption("Podosheria xanthii germination percentage over a range of temperatures and incubation periods")
Temperature | germtubes | Mean | Range | Standard deviation | Replicates |
8 | 0 | 100.0 | 100 - 100 | 0.000 | 2 |
8 | 1 | 0.0 | 0 - 0 | 0.000 | 2 |
8 | 2 | 0.0 | 0 - 0 | 0.000 | 2 |
8 | 3 | 0.0 | 0 - 0 | 0.000 | 2 |
17 | 0 | 70.0 | 70 - 70 | 0.000 | 2 |
17 | 1 | 13.5 | 12 - 15 | 2.121 | 2 |
17 | 2 | 16.5 | 15 - 18 | 2.121 | 2 |
17 | 3 | 0.0 | 0 - 0 | 0.000 | 2 |
19 | 0 | 62.0 | 62 - 62 | 0.000 | 2 |
19 | 1 | 12.5 | 10 - 15 | 3.536 | 2 |
19 | 2 | 21.0 | 19 - 23 | 2.828 | 2 |
19 | 3 | 4.5 | 4 - 5 | 0.707 | 2 |
22 | 0 | 30.5 | 27 - 34 | 4.950 | 2 |
22 | 1 | 6.5 | 5 - 8 | 2.121 | 2 |
22 | 2 | 14.0 | 13 - 15 | 1.414 | 2 |
22 | 3 | 49.0 | 48 - 50 | 1.414 | 2 |
25 | 0 | 28.0 | 26 - 30 | 2.828 | 2 |
25 | 1 | 9.0 | 8 - 10 | 1.414 | 2 |
25 | 2 | 12.0 | 10 - 14 | 2.828 | 2 |
25 | 3 | 51.0 | 50 - 52 | 1.414 | 2 |
28 | 0 | 21.0 | 19 - 23 | 2.828 | 2 |
28 | 1 | 16.0 | 15 - 17 | 1.414 | 2 |
28 | 2 | 10.0 | 9 - 11 | 1.414 | 2 |
28 | 3 | 53.0 | 51 - 55 | 2.828 | 2 |
31 | 0 | 83.0 | 81 - 85 | 2.828 | 2 |
31 | 1 | 17.0 | 15 - 19 | 2.828 | 2 |
31 | 2 | 0.0 | 0 - 0 | 0.000 | 2 |
31 | 3 | 0.0 | 0 - 0 | 0.000 | 2 |
35 | 0 | 100.0 | 100 - 100 | 0.000 | 2 |
35 | 1 | 0.0 | 0 - 0 | 0.000 | 2 |
35 | 2 | 0.0 | 0 - 0 | 0.000 | 2 |
35 | 3 | 0.0 | 0 - 0 | 0.000 | 2 |
Let’s plot the number of germ tubes for each temperature treatment. We can use stacked bars to show not only the number of germ tubes, but the fact that a germinated conidia with three germ tubes, also has two germ tubes ect.
Figure_3a <-
br_tm |>
ggplot(aes(x = as.factor(Tm), y = conidia/2 ,
fill = factor(germtubes,levels = c(1,2,3,0)))) +
geom_col(position = position_stack(reverse = TRUE))+
scale_fill_manual(values = c("0" = "grey90",
"1" = "grey70",
"2" = "grey55",
"3" = "grey40"),
name = "Germ tubes")+
xlab(expression("Temperature " ( degree*C)))+
ylab("Germ tube production (%)")+
labs(colour = "Germ tubes")+
theme_classic()+
annotate("text", x = 0.8, y = 99, label = "(a)", size = 5)
Figure_3a
This plot shows that the optimum temperature for rapid P. xanthii germination is around 28\(^\circ\)C with 53 % of conidia presenting three or more germ tubes. 25\(^\circ\)C and 22\(^\circ\)C presented 51 and 49 % respectively. Although there were only fewer conidia with tertiary germtubes in the 22°C treatment, combined means of conidia with two or more germtubes were 63, 63, and 63 in temperature treatments 22\(^\circ\)C, 25\(^\circ\)C, and 28\(^\circ\)C. Only 4.5% of germinated conidia had produced tertiary germtubes in the 19\(^\circ\)C treatment, with 0 tertiary germ tubes observed in either 17\(^\circ\)C or 31\(^\circ\)C treatments.
Also, no germination was observed at temperatures 8\(^\circ\)C , or at 35\(^\circ\)C.
Lets build a model to help us predict germ tube production after 36 hours when incubated at each respective temperature.
Trecate (2019) claims if there are two or more germ tubes, successful
infection is likely to have occurred. Here we have a response variable,
germtubes
which describes the progress of a potential
infection by a single conidia.
- 0
which represents a non-germinated conidia,
- 1
a germinated conidia with one germ tube,
- 2
a germinated conidia with two germ tubes,
- 3
a germinated conidia with three or more germ tubes.
The response variable might be thought of as continuous. However,
given our aim is to estimate successful infection by the pathogen by
proxy of germ tube production, estimating the number of germ tubes and
hyphae beyond three is irrelevant and provide inaccurate results, given
the difference between the number of germtubes may not be equal or
consistent between levels. germtubes
would be better
characterised as ordinal and therefore leads us to using ordinal
logistic regression methods.
The aim of this analysis is to predict the likelihood of a conidia
germinating and branching under different temperature conditions.
The data is count data and our response is not binomial but ordered
multinomial, which leads us to use an ordinal logistic regression.
First we need to convert the count data into a longer format, so each row represents an observation.
Now we have prepared our data lets try it in our first model assuming a linear and additive relationship between germ tube number and temperature and leaf rep.
Call:
polr(formula = germtubes ~ Tm + rep, data = br_tm_lng, Hess = TRUE)
Coefficients:
Value Std. Error t value
Tm 0.02208 0.006384 3.4580
rep2 0.01755 0.100010 0.1755
Intercepts:
Value Std. Error t value
0|1 1.0022 0.1657 6.0473
1|2 1.4227 0.1670 8.5171
2|3 1.9288 0.1701 11.3379
Residual Deviance: 3372.54
AIC: 3382.54
The output is in proportional log odds and presents Coefficients which represent the change in log odds for each unit of the predictor. For example for each degree temperature increases the log odds increases 0.022.
The coefficient rep
is not significant and can probably
be dropped from the model.
Lets plot the model predictions for temperatures between 8\(^\circ\) C and 35\(^\circ\) C to determine if the result is reasonable compared to the raw data.
# Make a data.frame of test values to feed in the model
test_dat <- data.frame(Tm = seq(from = 5, to = 36, by = 0.1), # between 5 and 8 degrees
rep = as.factor(rbinom(length(seq(from = 5, to = 36, by = 0.1)),1,0.5)+1)) # randomly use either the first or second rep
# create a data.frame with predicted values and reference them to the respective temperature
m1_pred <-
as.data.table(cbind(Tm = test_dat$Tm,
round(predict(m1,
test_dat,
type = "p"),
3)))
# format to long
m1_pred <-
melt(
m1_pred,
id.vars = "Tm",
measure.vars = c("0", "1", "2", "3"),
variable.name = "germtubes",
value.name = "prob"
)
m1_pred |>
ggplot(aes(Tm, prob, colour = germtubes))+
geom_line(size = 1)+
theme_minimal()
The result is rather linear and predicts more germ tubes at 40\(^\circ\)C when the raw data does not see any from 35C or presumably above.
Lets try applying some splines to temperature and dropping rep.
We need to read in the splines
package to fit splines as
they are not supported in the polr()
function.
library(splines)
m2 <- polr(germtubes ~ bs(Tm, Boundary.knots = c(8,35)),
data = br_tm_lng,
Hess = TRUE)
summary(m2)
Call:
polr(formula = germtubes ~ bs(Tm, Boundary.knots = c(8, 35)),
data = br_tm_lng, Hess = TRUE)
Coefficients:
Value Std. Error t value
bs(Tm, Boundary.knots = c(8, 35))1 0.1975 1.6404 0.1204
bs(Tm, Boundary.knots = c(8, 35))2 16.6471 0.8822 18.8701
bs(Tm, Boundary.knots = c(8, 35))3 -2.8168 1.1639 -2.4202
Intercepts:
Value Std. Error t value
0|1 5.0417 0.8499 5.9319
1|2 5.6921 0.8513 6.6867
2|3 6.4421 0.8529 7.5529
Residual Deviance: 2539.384
AIC: 2551.384
The residual deviance and AIC on this model is much lower, indicating this might be a better fit. Also the t value is higher hinting this model shows clearer differences between the variables, especially in the middle of the studied temperature range.
Now lets try predicting from this model.
m2_pred <- as.data.table(round(predict(m2,
test_dat,
type = "p"),
3))[, Tm := test_dat$Tm]
We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C
# format to long
m2_pred <-
melt(
m2_pred,
id.vars = "Tm",
measure.vars = c("0", "1", "2", "3"),
variable.name = "germtubes",
value.name = "prob"
)
m2_pred |>
ggplot(aes(Tm, prob, colour = germtubes))+
geom_line(size = 1)+
theme_minimal()
This plot looks like it would fit the raw data much better than
m1
. However, the model estimates 25\(^\circ\)C as the temperature likely to
produce a greater proportion of germinated conidia with three
germ-tubes. This does not reflect the raw data we plotted in figure
2b.
Lets see if we can improve the model by changing the number and placement of the spline knots.
Call:
polr(formula = germtubes ~ bs(Tm, knots = c(19, 23, 27, 30)),
data = br_tm_lng, Hess = TRUE)
Coefficients:
Value Std. Error t value
bs(Tm, knots = c(19, 23, 27, 30))1 15.009 40.5865 0.36979
bs(Tm, knots = c(19, 23, 27, 30))2 8.925 38.3956 0.23246
bs(Tm, knots = c(19, 23, 27, 30))3 15.293 38.8312 0.39383
bs(Tm, knots = c(19, 23, 27, 30))4 12.718 38.6486 0.32907
bs(Tm, knots = c(19, 23, 27, 30))5 16.338 38.8583 0.42044
bs(Tm, knots = c(19, 23, 27, 30))6 1.130 39.5542 0.02856
bs(Tm, knots = c(19, 23, 27, 30))7 -7.676 0.9026 -8.50526
Intercepts:
Value Std. Error t value
0|1 12.6119 38.7288 0.3256
1|2 13.2755 38.7289 0.3428
2|3 14.0390 38.7289 0.3625
Residual Deviance: 2501.665
AIC: 2521.665
The residual deviance and AIC on this model is lower, indicating this might be a better fit. Also the t value is higher hinting this model shows clearer differences between the temperature variables.
Now lets try predicting from this model.
m3_pred <- as.data.table(round(predict(m3,
test_dat,
type = "p"),
3))[, Tm := test_dat$Tm]
We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C
# format to long
m3_pred <-
melt(
m3_pred,
id.vars = "Tm",
measure.vars = c("0", "1", "2", "3"),
variable.name = "germtubes",
value.name = "prob"
)
m3_pred |>
ggplot(aes(Tm, prob, colour = germtubes))+
geom_line(size = 1)+
theme_minimal()
The plot shows that this model might have too many spline knots. Lets
compare the models using the anova()
function.
anova(m2,m3)
Likelihood ratio tests of ordinal regression models
Response: germtubes
Model Resid. df Resid. Dev Test Df
1 bs(Tm, Boundary.knots = c(8, 35)) 1594 2539.384
2 bs(Tm, knots = c(19, 23, 27, 30)) 1590 2501.665 1 vs 2 4
LR stat. Pr(Chi)
1
2 37.7193 1.280351e-07
While the anova says m3
is a significantly better model,
the residual deviance informs us it is not by much.
Based on the plot of model 3, we will try another model with fewer knots to reduce the wiggliness.
Call:
polr(formula = germtubes ~ bs(Tm, knots = c(23, 28)), data = br_tm_lng,
Hess = TRUE)
Coefficients:
Value Std. Error t value
bs(Tm, knots = c(23, 28))1 2.455 2.366 1.037
bs(Tm, knots = c(23, 28))2 8.255 1.780 4.638
bs(Tm, knots = c(23, 28))3 7.719 2.017 3.827
bs(Tm, knots = c(23, 28))4 8.419 2.539 3.317
bs(Tm, knots = c(23, 28))5 -33.997 11.786 -2.884
Intercepts:
Value Std. Error t value
0|1 6.5111 1.8145 3.5883
1|2 7.1687 1.8151 3.9496
2|3 7.9200 1.8154 4.3626
Residual Deviance: 2517.106
AIC: 2533.106
The residual deviance and AIC on this model is higher than the
overfit model m3
, however lower than the under fit model
m2
indicating this model would better reflect the raw data.
Also the t value is higher hinting this model shows clearer differences
between the variables.
Now lets try predicting from this model, we will use the
marginaleffects
package so we can extract standard
errors.
test_dat <- data.table(Tm = seq(from = 10,
to = 35,
by = 0.1))
m4_pred <- marginaleffects::predictions(m4,
newdata = test_dat,
type = "probs")
# set data.table format
setDT(m4_pred)
# calculate approximate confidence intervals
m4_pred[, c("CI_low","CI_high") := .(predicted - 1.96*std.error,
predicted + 1.96*std.error)]
# change the column names
setnames(m4_pred,
old = c("group","Tm.x"),
new = c("germtubes", "Tm"))
We get a warning indicating the the predicted values are outside the range of the spline boundary knots. This is ok as we know there should be a zero probability of spore germination outside this temperature range of 8\(^\circ\)C and 35\(^\circ\)C
# raw data overlay
Figure_3b <-
m4_pred |>
ggplot(aes(Tm, predicted *100, colour = germtubes))+
geom_line(size = 1)+
geom_line(data = m4_pred, aes(x = Tm, y = CI_low*100, #colour = group,
), linetype = "dotted", size = 0.7)+
geom_line(data = m4_pred, aes(x = Tm, y = CI_high*100, #colour = germtubes,
), linetype = "dotted", size = 0.7)+
scale_color_grey(start = 0.75,end = 0)+
theme_bw()+
xlab(expression("Temperature " ( degree*C)))+
ylab("Gserm tube production (%)")+
labs(colour = "Germ tubes")+
annotate(geom = "text",
x = 6,
y = 96,
label = "(b)",size = 5)
Figure_3b
The plot shows a smoother fit with the knots we have tried. Lets
compare the model using the anova()
function.
anova(m3,m4)
Likelihood ratio tests of ordinal regression models
Response: germtubes
Model Resid. df Resid. Dev Test Df
1 bs(Tm, knots = c(23, 28)) 1592 2517.106
2 bs(Tm, knots = c(19, 23, 27, 30)) 1590 2501.665 1 vs 2 2
LR stat. Pr(Chi)
1
2 15.44083 0.000443676
While the anova says m3
is a significantly better model,
the residual deviance informs us it is not by much. Because the
m3
plot looks over-fit and m4
plot seems a
more reasonable fit to the raw data, we will use this.
Lets examine the coefficients.
coef(m4)
bs(Tm, knots = c(23, 28))1 bs(Tm, knots = c(23, 28))2
2.454674 8.255239
bs(Tm, knots = c(23, 28))3 bs(Tm, knots = c(23, 28))4
7.718719 8.419468
bs(Tm, knots = c(23, 28))5
-33.996879
There are five coefficients for the Temperature variable because the
b-spline breaks the data into different ranges depending on
knot
placement. Here we see there are four knots at the
following temperatures
tm_bs <- bs(br_tm_lng$Tm, knots = c(23,28))
sort(c(attr(tm_bs, which = "knots"),
attr(tm_bs, which = "Boundary.knots")))
[1] 8 23 28 35
As can be seen in the summary above it is a little hard to make sense of these values. The spline breaks up the temperature predictor into five predictors, each explaining the difference between temperatures for a particular range. The value gives us a scaled representation of the mean magnitude of each temperature range on a log odds scale, and therefore the only inference we can make from these values is some are greater some are less and some have a large negative influence on spore germination state.
The significance of each predictor can be observed by the standard error or the ‘t value’ \(t value = \frac{coef}{se}\). We can also obtain P-values from this output, however these methods are not advised given they assume a normal distribution and infinite degrees of freedom.
# get table of coefficients
coef_table <- data.table(coef(summary(m4)), keep.rownames = TRUE)
# calculate p-values
coef_table[, p_value := pnorm(abs(`t value`),lower.tail = FALSE) *2][, sig := p_star(p_value)]
coef_table
rn Value Std. Error t value
1: bs(Tm, knots = c(23, 28))1 2.454674 2.366050 1.037456
2: bs(Tm, knots = c(23, 28))2 8.255239 1.779860 4.638138
3: bs(Tm, knots = c(23, 28))3 7.718719 2.016666 3.827465
4: bs(Tm, knots = c(23, 28))4 8.419468 2.538598 3.316582
5: bs(Tm, knots = c(23, 28))5 -33.996879 11.786364 -2.884425
6: 0|1 6.511094 1.814542 3.588284
7: 1|2 7.168750 1.815077 3.949557
8: 2|3 7.919989 1.815447 4.362556
p_value sig
1: 2.995232e-01
2: 3.515620e-06 ***
3: 1.294697e-04 ***
4: 9.112572e-04 ***
5: 3.921296e-03 **
6: 3.328613e-04 ***
7: 7.829585e-05 ***
8: 1.285514e-05 ***
Perhaps a better method for inferring statistical significance is to consider the confidence intervals and if the CI includes zero.
m4_ci <- confint(m4)
m4_ci
2.5 % 97.5 %
bs(Tm, knots = c(23, 28))1 -0.9261495 10.34610
bs(Tm, knots = c(23, 28))2 5.6446504 14.10850
bs(Tm, knots = c(23, 28))3 4.9189731 14.53761
bs(Tm, knots = c(23, 28))4 3.6682267 14.58982
bs(Tm, knots = c(23, 28))5 -56.5696386 -10.20560
So for temperatures < 8\(^\circ\)C the coefficient was 2.4546742, with CIs that include zero indicating little difference between the number of conidial germ-tubes at temperature treatments in this range.
For temperatures 8\(^\circ\)C <
Tm
< 23\(^\circ\)C the
coefficient was 8.2552385, with CIs that don’t include zero
indicating a likely significant difference between the proportions of
conidia with varying numbers of germ-tubes due to temperatures in this
range.
For temperatures 23\(^\circ\)C <
Tm
< 28\(^\circ\)C the
coefficient was 7.7187186, with CIs that don’t include zero
indicating a likely significant difference between the proportions of
conidia with varying numbers of germ-tubes due to temperatures in this
range.
For temperatures 28\(^\circ\)C <
Tm
< 35\(^\circ\)C the
coefficient was 8.4194678, with CIs that don’t include zero
indicating a likely significant difference between the proportions of
conidia with varying numbers of germ-tubes due to temperatures in this
range.
For temperatures 35\(^\circ\)C <
Tm
the coefficient was -33.9968787, with CIs that
don’t include zero indicating a likely significant difference
and negative association between the proportions of conidia with varying
numbers of germ-tubes due to temperatures in this range.
\(\frac{t}{13.3}I(0<=t<13.3)+\frac{38.83333-t}{38.83333-13.3}I(13.3<=t<38.83333)\)
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.utf8
[2] LC_CTYPE=English_Australia.utf8
[3] LC_MONETARY=English_Australia.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8
attached base packages:
[1] splines stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] marginaleffects_0.6.0 flextable_0.7.1 MASS_7.3-56
[4] ggplot2_3.3.6 rstanarm_2.21.3 Rcpp_1.0.8.3
[7] mgcv_1.8-40 nlme_3.1-157 data.table_1.14.2
[10] devtools_2.4.3 usethis_2.1.5
loaded via a namespace (and not attached):
[1] minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2
[4] ggridges_0.5.3 rprojroot_2.0.3 markdown_1.1
[7] base64enc_0.1-3 fs_1.5.2 rstudioapi_0.13
[10] farver_2.1.0 rstan_2.21.5 remotes_2.4.2
[13] DT_0.23 fansi_1.0.3 xml2_1.3.3
[16] codetools_0.2-18 downlit_0.4.0 cachem_1.0.6
[19] knitr_1.39 shinythemes_1.2.0 pkgload_1.2.4
[22] bayesplot_1.9.0 jsonlite_1.8.0 nloptr_2.0.1
[25] shiny_1.7.1 compiler_4.2.0 backports_1.4.1
[28] Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0
[31] later_1.3.0 htmltools_0.5.2 prettyunits_1.1.1
[34] tools_4.2.0 igraph_1.3.1 gtable_0.3.0
[37] glue_1.6.2 reshape2_1.4.4 dplyr_1.0.9
[40] jquerylib_0.1.4 vctrs_0.4.1 crosstalk_1.2.0
[43] insight_0.18.0 xfun_0.31 stringr_1.4.0
[46] ps_1.7.0 brio_1.1.3 testthat_3.1.4
[49] lme4_1.1-29 mime_0.12 miniUI_0.1.1.1
[52] lifecycle_1.0.1 gtools_3.9.2 zoo_1.8-11
[55] scales_1.2.0 colourpicker_1.1.1 promises_1.2.0.1
[58] parallel_4.2.0 inline_0.3.19 shinystan_2.6.0
[61] yaml_2.3.5 memoise_2.0.1 gridExtra_2.3
[64] gdtools_0.2.4 loo_2.5.1 StanHeaders_2.21.0-7
[67] sass_0.4.1 distill_1.4 stringi_1.7.6
[70] highr_0.9 dygraphs_1.1.1.6 desc_1.4.1
[73] checkmate_2.1.0 zip_2.2.0 boot_1.3-28
[76] pkgbuild_1.3.1 systemfonts_1.0.4 rlang_1.0.2
[79] pkgconfig_2.0.3 matrixStats_0.62.0 evaluate_0.15
[82] lattice_0.20-45 purrr_0.3.4 labeling_0.4.2
[85] rstantools_2.2.0 htmlwidgets_1.5.4 processx_3.5.3
[88] tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3
[91] R6_2.5.1 generics_0.1.2 pillar_1.7.0
[94] withr_2.5.0 xts_0.12.1 survival_3.3-1
[97] tibble_3.1.7 crayon_1.5.1 uuid_1.1-0
[100] utf8_1.2.2 officer_0.4.2 rmarkdown_2.14
[103] grid_4.2.0 callr_3.7.0 threejs_0.3.3
[106] digest_0.6.29 xtable_1.8-4 httpuv_1.6.5
[109] RcppParallel_5.1.5 stats4_4.2.0 munsell_0.5.0
[112] viridisLite_0.4.0 bslib_0.3.1 sessioninfo_1.2.2
[115] shinyjs_2.1.0