【迁移】有序分类Logistic回归

Last updated on December 10, 2024 pm

Logistic 回归的假设

  • 假设1:因变量唯一,且为有序多分类变量
  • 假设2:存在一个或多个自变量,可为连续、有序多分类或无序分类变量。
  • 假设3:自变量之间无多重共线性。
  • 假设4:模型满足“比例优势”假设。

选择自变量

1
2
3
4
5
paste0('"',paste(colnames(data$training_set), collapse = '", "'),'"')
str(data$training_set$INFARCT_RELATED_VESSEL) # 确定标记为了无序多分类 factor(ordered = F)
independent_variable = c(
"SEX", "AGE", "HEIGHT", "WEIGHT", "BMI", "BSA", "HR", "SBP", "DBP",
)

单因素回归

读取模型参数的常用函数

数据归一化

  • 方式1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
normalize <- function(x, mean_, sd_) {
return ((x - mean_) / sd_)
}
res = data.frame()
for (iv in independent_variable){
res[iv, 'VarName'] = iv
if(is.factor(data$training_set[[iv]])){
res[iv, 'mean'] = 0
res[iv, 'sd'] = 0
}else{
res[iv, 'mean'] = mean(data$training_set[[iv]], na.rm = T)
res[iv, 'sd'] = sd(data$training_set[[iv]], na.rm = T)
data$training_set[[iv]] = normalize(data$training_set[[iv]], res[iv, 'mean'], res[iv, 'sd'])
}
}
res
readr::write_csv(x = res, file = '../../data/normalize_train.csv')
  • 方式2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
normalize <- function(x, min_, max_) {
return ((x - min_) / (max_ - min_))
}
res = data.frame()
for (iv in independent_variable){
res[iv, 'VarName'] = iv
if(is.factor(data$training_set[[iv]])){
res[iv, 'min'] = 0
res[iv, 'max'] = 0
}else{
res[iv, 'min'] = min(data$training_set[iv], na.rm = T)
res[iv, 'max'] = max(data$training_set[iv], na.rm = T)
data$training_set[[iv]] = normalize(data$training_set[[iv]], res[iv, 'min'], res[iv, 'max'])
}
}
res
readr::write_csv(x = res, file = '../../data/normalize_train.csv')

Logistic 回归

概率分布(family)和相应默认的连接函数(function)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
res = data.frame()
for (iv in independent_variable){
lmf <- formula(paste0("IMH~", iv))
m <- glm(lmf, family=binomial(link="logit"), data=data$training_set)
vns = names(coef(m))
vns = vns[2:length(vns)] # 删去常数项
for (vn in vns){
res[vn, 'VarName'] = iv
res[vn, 'VarValue'] = vn
res[vn, 'mean'] = exp(coef(m)[vn]) # to OR
res[vn, 'lower'] = exp(confint(m)[vn,1])
res[vn, 'upper'] = exp(confint(m)[vn,2])
res[vn, 'Pvalue'] = summary(m)$coefficients[vn, "Pr(>|z|)"]
}
}
res
readr::write_csv(x = res, file = '../../data/test_glm_train.csv')

附加 线性回归

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
res = data.frame()
for (iv in independent_variable){
lmf <- formula(paste0("HEMO_VOLUME~", iv))
m <- lm(lmf, data$training_set)
vns = names(coef(m))
vns = vns[2:length(vns)] # 删去常数项
for (vn in vns){
res[vn, 'VarName'] = iv
res[vn, 'VarValue'] = vn
res[vn, 'mean'] = coef(m)[vn]
res[vn, 'lower'] = confint(m)[vn,1]
res[vn, 'upper'] = confint(m)[vn,2]
res[vn, 'Pvalue'] = summary(m)$coefficients[vn, "Pr(>|t|)"]
}
}
res
readr::write_csv(x = res, file = '../../data/test_lm_train.csv')

多重共线性检测

1
2
qr(as.matrix(data$training_set[independent_variable]))$rank == length(independent_variable)
kappa(cor(as.matrix(data$training_set[independent_variable])), exact= TRUE) < 100

缺失值插补

多因素回归

  • 用到再写

绘制森林图

1
2
3
df <- readRDS('Logistic.rds')
options(repr.plot.width=8, repr.plot.height=6)
f_forestplot(df, zero = 1)

【迁移】有序分类Logistic回归
https://hexo.limour.top/ordered-classification-logistic-regression
Author
Limour
Posted on
July 15, 2022
Updated on
December 10, 2024
Licensed under