Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US (2024)

\externaldocument

supp

Fangzhi Luo
Department of Epidemiology and Biostatistics,
College of Public Health,University of Georgia, Athens, GA, USA
Jianbin Tan
Department of Biostatistics and Bioinformatics,
School of Medicine, Duke University, Durham, NC, USA
Donglan Zhang
Department of Foundations of Medicine,
NYU Grossman Long Island School of Medicine, Mineola, NY, USA
Hui Huang
Center for Applied Statistics and School of Statistics,
Renmin University of China, Beijing, China
Ye Shen
Department of Epidemiology and Biostatistics,
College of Public Health,University of Georgia, Athens, GA, USA
Jianbin Tan is the co-first author.The authors gratefully acknowledge the National Natural Science Foundation of China (grants nos. 12292980, 12292984 and 12231017) and the MOE project of key research institute of humanities and social sciences (grant no. 22JJD910001).

Abstract

Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH – stroke mortality associations.However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control.To fill this gap, we propose a novel clustering method for SDOH – stroke mortality associations in the US counties.To enhance interpretability and statistical efficiency of the clustering outcomes, we introduce a new class of smoothness-sparsity pursued penalties for simultaneous clustering and variable selection in the longitudinal associations.As a result, we can identify important SDOH that contribute to longitudinal changes in the stroke mortality, facilitating clustering of US counties into several regions based on how these SDOH relate to stroke mortality.The effectiveness of our proposed method is demonstrated through extensive numerical studies. By applying our method to a county-level SDOH and stroke mortality longitudinal data, we identify 18 important SDOH for stroke mortality and divide the US counties into two clusters based on these selected SDOH.Our findings unveil complex regional heterogeneity in the longitudinal associations between SDOH and stroke mortality, providing valuable insights in region-specific SDOH adjustments for mitigating stroke mortality.

Keywords: Functional clustering,Mixture model,Variable selection,Regularized expectation-maximization algorithm,Stroke

1 Introduction

The burden of stroke in the United States is enormous.As one of the most prevalent cardiovascular diseases,stroke remains consistently among the top five causes of death in the country (Koton etal., 2014), leading to over 130,000 fatalities annually (Hollowayetal., 2014).In the effort to prevent stroke deaths,Social Determinants of Health (SDOH) – encompassing economic, social, and environmental conditions where people live, learn, work, and play (Havranek etal., 2015) – have garnered significant attention due to their strong associations with stroke mortality (Powell-Wileyetal., 2022).Notably, in 2015, the impact of SDOH on stroke mortality was highlighted in an important scientific statement by the American Heart Association (AHA) (Mozaffarianetal., 2015).Furthermore, since 2019, the AHA has consistently emphasized the importance of SDOH in its Annual Heart Disease and Stroke Statistics Report, spanning across all chapters (Benjaminetal., 2019).These declarations underscore the urgency of understanding the associations between SDOH and stroke mortality. This understanding is crucial for informing targeted interventions aimed at controlling stroke mortality by addressing underlying social, economic, and environmental factors.

Recent findings have revealed a clear regional disparity in the associations between SDOH and stroke mortality.For instance, Zelko etal. (2023) identified state-wise differences in these associations.Additionally, Villablanca etal. (2016) and Son etal. (2023) observed significant disparities in the associations between rural and urban areas.To address such regional disparities, various state-wise (Gebreab etal., 2015) and rural-urban strategies (Labarthe etal., 2014; Record etal., 2015; Kapral etal., 2019) have been developed for region–specific stroke management. However, the existing literature does not guarantee that areas within a divided region share similar associations between SDOH and stroke mortality. Hence, employing a uniform SDOH adjustment in divided regions may not be an effective strategy for preventing stroke disease.Moreover, many SDOH have been found to exhibit longitudinal changes in their associations with stroke mortality (Heetal., 2021), a factor crucial for timely policymaking in stroke management. Targeting these longitudinal associations between SDOH and stroke mortality, clustering becomes a vital tool for determining reasonable regional divisions for stroke prevention. Nonetheless, this remains an unsolved issue requiring further investigation.

In this work, we propose a novel method for clustering longitudinal associations between SDOH and stroke mortality. In general, the clustering is implemented based on the similarity among associations from different counties in the US, and the resulting clustering outcomes can then be used to inform region-specific prevention measures for controlling stroke mortality.

To achieve this, we utilize county-level longitudinal data of SDOH and stroke mortality in the US, and treat the longitudinally observed data in each county as functional data.As a result, the concurrent associations between SDOH and stroke mortality are inherently functional objects, which can be effectively modeled using functional regression models, where the coefficients are also functional, capturing the relationship between SDOH and stroke mortality over time.For the clustering process, we model the functional coefficients using a finite mixture model, leading to a finite mixture of functional regression models.This approach enables the clustering of counties in the US into different regions based on the relationship between their SDOH and stroke mortality.

Our task essentially differs from the common clustering procedures for longitudinal data (Jacques andPreda, 2013, 2014; Liangetal., 2021), which primarily apply to observable longitudinal outcomes. In contrast, the county-level longitudinal associations between SDOH and stroke mortality in our case are unobservable.For clustering of these associations, one might consider applying the finite mixture of regression models (Jacobsetal., 1991; Jiang andTanner, 1999) to the SDOH and stroke mortality longitudinal data. However, this method cannot capture longitudinal changes in the associations within the clustering procedures, potentially resulting in unreliable clustering outcomes due to the disregard of longitudinal signals.In light of functional data analysis (Ramsay andSilverman, 2002, 2005), some studies proposed a finite mixture of functional regression models for clustering longitudinal associations (Yaoetal., 2011; Lu and Song, 2012).Nonetheless, they do not address the issue of collinearity among covariates in their functional regression models.In our study, the collection of SDOH serves as functional covariates and exhibits significant collinearity, stemming from their derivation from multiple domains (e.g., social, economic, and environmental domains). In this case, ignoring colinearity among the SDOH data may lead to misspecification in the functional regression model. This oversight could compromise the accuracy of the resultant clustering outcome.

Furthermore, to enhance interpretability, existing studies analyzing associations between SDOH and stroke mortality often adopt a pre-selection step on the SDOH covariates (Tsaoetal., 2022, 2023), as their number is usually large, containing hundreds of variables.In our case, the pre-selection of longitudinal SDOH data can be achieved through variable selection in functional regression models (Wangetal., 2008; Kongetal., 2016; Goldsmith andSchwartz, 2017).However, these methods generally assume that associations between covariates and responses are invariant across different samples. This assumption limits their direct applicability to our case, where SDOH covariates may exhibit distinct associations with stroke mortality among different counties.Moreover, performing selections of SDOH prior to clustering may introduce biases for the subsequent clustering outcome, as the selection of SDOH is unrelated to the clustering process.On the other hand, without proper selection of SDOH, the clustering process may be statistically inefficient due to the complex structure of longitudinal SDOH data, which possess high-dimensionality and functional nature simultaneously.To accommodate such a complicated structure, it would be beneficial to connect variable selections of SDOH to the clustering of longitudinal associations, yet this topic is rarely discussed in the literature.

In this article, we introduce a novel method for simultaneous clustering and variable selection of longitudinal associations between SDOH and stroke mortality in US counties. Our method, based on a finite mixture of functional linear concurrent models (FMFLCM), incorporates a new class of smoothness-sparsity pursued penalties to address the functional nature and high-dimensionality of the SDOH data. These penalties are designed to borrow information from distinct clusters of the FMFLCM, thereby enhancing statistical efficiency for both clustering and variable selection, and simultaneously addressing the collinearity issue among the SDOH data.For the estimation, we develop a novel regularized expectation-maximization (REM) algorithm by incorporating the proposed penalties.This approach allows for the clustering of longitudinal associations while embedding a variable selection step of SDOH covariates. As a result, the cluster memberships and the selected SDOH can be iteratively updated within the REM algorithm. This iterative updating process helps mitigate potential biases stemming from the selected covariates during the clustering process.Through these procedures, we provide a novel data-driven method for county-level regional division, aiming to offer insights into region-specific stroke prevention measures for US counties.

The remainder of this article is organized as follows.In Section 2, we begin by introducing the SDOH and stroke mortality dataset, and then proceed to demonstrate some of their data features in Section 2.1. Following this, we present the FMFLCM in Section 2.2, followed by the demonstration of sparsity and smoothness pursued penalties in Section 2.3, and the REM in Section 2.4. In Section 3, we conduct simulation studies to compare the proposed methods with some competing approaches in terms of clustering performance, variable selection, and parameter estimation. In Section 4, we apply the proposed clustering method to our dataset and present the clustering result, along with the estimation relating to the selected SDOH. Finally, We provide conclusions and discussion in Section 5.

2 Methodologies

2.1 Data Source

In 2020, the Agency for Healthcare Research and Quality (AHRQ) compiled and released a SDOH database for a better understanding of community-level factors, healthcare quality and delivery, and individual health.The SDOH database contains yearly records of $345$ SDOH collected from 3226 counties from 2009 to 2018.These SDOH are classified into 5 domains: (1) social context, such as age, race and ethnicity, and veteran status; (2) economic context, such as income and unemployment; (3) education; (4) physical infrastructure, such as housing, food insecurity, and transportation; and (5) health care contexts, such as health insurance coverage and health care access.To study the association between the SDOH and stroke mortality, we connect the SDOH database with a stroke mortality data provided by the Interactive Atlas of Heart Disease and Stroke at the Centers for Disease Control and Prevention (CDC) on the county level.The stroke mortality database was originally compiled from 2 data sources: (1) the National Vital Statistics System at the National Center for Health Statistics, and (2) the hospital discharge data from the Centers for Medicare & Medicaid Services’ Medicare Provider Analysis and Review (MEDPAR) file.All data and materials used in this analysis are publicly available at the AHRQ website: https://www.ahrq.gov/sdoh/index.html and CDC website https://www.cdc.gov/dhdsp/maps/atlas/index.htm.Since the AHRQ database isHIPAA (Health Insurance Portability and Accountability Act) compliant, our data do not require to be reviewed by an institutional review board.

It’s worth noting that the SDOH dataset contains missing values. Following the approach of a previous study on the dataset (Son etal., 2023), we exclude the SDOH variables that have a missing proportion of more than 60%. The average missing proportion of the remaining variables is 3.6%.For these missing values, we employ a k-nearest neighbors (KNN) method (Kowarik andTempl, 2016) to impute the SDOH data, ensuring the SDOH and stroke mortality data are aligned with time for each county.The detailed implementation of the KNN is provided in Part A of the Supplementary Material.

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US (1)

As mentioned previously, significant collinearity may exist among the SDOH data. This phenomenon is observed in our dataset, where the longitudinal data between some of the SDOH exhibit significant correlations as illustrated in Figure 1.Besides, coinciding with the previous studies (Villablanca etal., 2016; Heetal., 2021; Son etal., 2023), evident regional disparities and longitudinal variations can also observed in the SDOH – stroke mortality associations from our dataset.To demonstrate this,we calculate the yearly Pearson correlation between stroke mortality and the percentage of Asian and Pacific Island language speakers (PAPL), one principal sociocultural factor in the SDOH.To calculate the yearly correlations, we first divide the counties in the US into several regions, and presume that the counties in a divided region exhibit the same SDOH – stroke mortality correlation.We consider two types of divisions: state-wise division (Zelko etal., 2023) and rural-urban division (Son etal., 2023).The yearly correlations from 2009-2018 for each state, or for rural and urban areas, are presented in panels (B) and (D) of Figure 2, respectively.Besides, we also present the geographical maps of correlations for the years 2009, 2013, and 2018 in panels (A) and (C) of Figure 2, respectively for two types of division strategies.

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US (2)

From the maps in Figure 2, we observe significant regional disparities in the correlations among different states, as well as between rural and urban areas. These correlations all exhibit changes over time.In addition, we find that the longitudinal correlations in rural and urban areas (shown in panel (D) of Figure 2) are consistently negative over time, while the correlations from some states (such as Nevada, as seen in panel (B) of Figure 2) are mostly positive. These results suggest that the outcomes of longitudinal associations are highly sensitive to the region division strategy, and different region divisions may lead to inconsistent conclusions for the stroke management. As such, we require a reasonable data-driven method for the region division in exploring SDOH – stroke mortality associations.

2.2 Finite Mixture of Functional Linear Concurrent models

Let $Y_{i}(t)\in\mathbb{R}$ and $\bm{X}_{i\cdot}(t):=(X_{i1}(t),\dots,X_{ip}(t))^{T}\in\mathbb{R}^{p}$ represent the stroke mortality and SDOH data in the $i$ th county at time $t$ , respectively, where $i=1,\ldots,n$ and $t\in\mathcal{T}$ , with $n$ being the number of counties and $\mathcal{T}$ being the observed time period. Here, $Y_{i}(\cdot)$ and $\bm{X}_{i\cdot}(\cdot)$ are considered the functional response and covariate samples, respectively, with $n$ and $p$ being the sample size and dimension of the covariates.Without loss of generality, $\mathcal{T}$ is $[0,1]$ in this article.

We focus on the association between $Y_{i}(t)$ and $\bm{X}_{i\cdot(t)}$ for $t\in[0,1]$ , referred to as longitudinal associations in what follows.To cluster the longitudinal associations of different counties $i$ , we assume the samples $\{Y_{i}(\cdot),{\bm{X}_{i\cdot}(\cdot)};i=1,\ldots,n\}$ can be divided into $K$ clusters and follow a finite mixture of functional linear concurrent models

Y_{i}(t)=\sum_{j=1}^{p}X_{ij}(t)\beta_{jk}(t)+\varepsilon_{ik}(t)\ \text{if}\ %Z_{i}=k,

(1)

where $Z_{i}$ is the cluster membership for the $i$ th subject, $\beta_{jk}(\cdot)$ is the functional coefficient for the $k$ th group to capture the longitudinal associations between the response and the $j$ th covariate,and $\varepsilon_{ik}(t)$ is a Gaussian white noise with variance $\sigma_{k}^{2}$ for each $k$ .We assume that the white noise processes $\{\varepsilon_{ik}(\cdot);i=1,\ldots,n,k=1,\ldots,K\}$ are independent across different $i$ s and $k$ s, and $Z_{1},\ldots,Z_{n}$ are i.i.d. samples from a multinomial distribution on $\{1,\ldots,K\}$ with $\mathbb{P}(Z_{i}=k)=\pi_{k}$ .

Let $\bm{\beta}(t)=\left\{\beta_{jk}(t)\right\}_{j=1,\ldots,p;k=1,\ldots,K}$ , $\bm{\sigma}^{2}:=(\sigma_{1}^{2},\ldots,\sigma_{K}^{2})^{T}$ and $\bm{\pi}:=(\pi_{1},\ldots,\pi_{K})^{T}$ .We use $\bm{\beta}_{\cdot k}(t)$ and $\bm{\beta}_{j\cdot}(t)$ to denote the $k$ th column and the $j$ th row of $\bm{\beta}(t)$ , respectively.For ease of notation, we might use $\bm{\beta}$ , $\bm{\beta}_{\cdot k}$ , $\bm{\beta}_{j\cdot}$ , and $\beta_{jk}$ to represent $\bm{\beta}(\cdot)$ , $\bm{\beta}_{\cdot k}(\cdot)$ , $\bm{\beta}_{j\cdot}(\cdot)$ and $\beta_{jk}(\cdot)$ in what follows.

Since our dataset are only observed on a finite time grid, we assume that $\big{\{}\big{(}Y_{i}(t_{is}),\bm{X}_{i}(t_{is})\big{)};$ $i=1,\ldots,n,s=1,\ldots,S_{i}\big{\}}$ is the observed data for FMFLCM, where $\{t_{is};i=1,\ldots,n,s=1,\ldots,S_{i}\}\subset\mathcal{T}$ .The log-likelihood of the parameters ${\bm{\Phi}}:=(\bm{\beta},\bm{\sigma}^{2},\bm{\pi})$ is then given as

\displaystyle l({\bm{\Phi}})

\displaystyle=\sum_{i=1}^{n}\sum_{s=1}^{S_{i}}\text{log}\left\{\sum_{k=1}^{K}{%\pi_{k}}f\left(Y_{i}(t_{is});\bm{X}_{i}(t_{is})^{T}\bm{\beta}_{\cdot k}(t_{is}%),\sigma_{k}^{2}\right)\right\},

(2)

where $f(\cdot;\mu,\sigma^{2})$ is a Gaussian density function with mean $\mu$ and variance $\sigma^{2}$ .One may adopt the maximum likelihood estimator (MLE) of (2) to estimate $\bm{\Phi}$ .However, given that the functional coefficient $\bm{\beta}$ is an infinite-dimensional parameter, the optimization of $\bm{\beta}$ in (2) is generally an ill-posed problem.Furthermore, the MLE of $\bm{\beta}$ may also suffer from the curse of dimensionality, i.e., $p$ is large. For this case, the MLE of (2) may be far from its true value, or even does not exist (Wangetal., 2008; Yi andCaramanis, 2015).

To solve the above issues, we propose a class of penalties in Section 2.3 to regularize the smoothness and the sparsity of $\bm{\beta}$ .Based on the proposed penalties,we develop a regularized expectation maximization algorithm in Section 2.4 to implement the clustering for subjects, the estimation for $\bm{\beta}$ , and the variable selections for the covariates simultaneously.

2.3 Sparsity and Smoothness Pursued Penalties

In this subsection, we propose a class of smoothly clipped absolute deviation (SCAD) type penalties (Fan andLi, 2001) to adjust the aforementioned ill-posed problems.The SCAD penalty takes the form

\bm{P}_{\text{SCAD}}\left(u;\lambda\right)=\left\{\begin{array}[]{ll}\lambda u%&\text{if}\ 0\leq u\leq\lambda,\\-\frac{(u^{2}-2\gamma\lambda u+\lambda^{2})}{2(\gamma-1)}&\text{if}\ \lambda<u%<\gamma\lambda,\\\frac{(\gamma+1)\lambda^{2}}{2}&\text{if}\ u\geq\gamma\lambda,\end{array}\right.

where $u$ is a scalar parameter to be penalized, $\lambda$ is a tuning parameter, and $\gamma$ is a hyperparameter chosen to be 3.7 as suggested in Fan andLi (2001).We extend the SCAD penalty for the functional objects $\bm{\beta}$ , which is the functional SCAD (FSCAD) penalty

\bm{P}_{\text{FSCAD}}(\bm{\beta};{\lambda},r):=\sum_{j=1}^{p}\sum_{k=1}^{K}\bm%{P}_{\text{SCAD}}(\|\beta_{jk}\|_{r};{\lambda}),

(3)

where $\|\beta_{jk}\|_{r}=\sqrt{\|\beta_{jk}\|^{2}+r\|\beta_{jk}^{{}^{\prime\prime}}\|}$ with $\|f\|=\sqrt{\int f^{2}(t)\ \mathrm{d}t}$ being an $L_{2}$ norm for a function $f$ .Here, $\|\beta_{jk}\|_{r}$ measures the magnitude and smoothness of $\beta_{jk}$ simultaneously(Meieretal., 2009), in which $r$ leverages the function’s magnitude and smoothness in the norm.By penalizing $\bm{\beta}$ using (3),we can penalize the smoothness of each $\beta_{jk}$ to deal with its functional nature, and shrink the $\beta_{jk}$ with small norm $\|\beta_{jk}\|_{r}$ to zero (Huangetal., 2009) for the purpose of variable selections.

2.4 Regularized EM Algorithm

In this subsection, we apply an EM algorithm to conduct the parameter estimation of FMFLCM.We first introduce a latent variable $z_{ik}:=I(Z_{i}=k)$ to represent the cluster membership of the subject $i$ , where $I(\cdot)$ is an indicator function.Denote $\bm{Z}$ as $\{z_{ik};i=1,\ldots,n,k=1,\ldots,K\}$ .The complete log-likelihood of FMFLCM containing $\bm{Z}$ is then given by

\displaystyle l_{C}(\bm{\Phi},\bm{Z})

\displaystyle=\sum_{i=1}^{n}\sum_{k=1}^{K}z_{ik}\sum_{s=1}^{S_{i}}\bigg{[}%\text{log}\pi_{k}+\text{log}\left\{f\left(Y_{i}(t_{is});\{\bm{X}_{i}(t_{is})\}%^{T}\bm{\beta}_{\cdot k}(t_{is}),\sigma_{k}^{2}\right)\right\}\bigg{]}.

(6)

In order to solve the ill-posed issue for estimating $\bm{\beta}$ , we employ a regularized EM (REM) algorithm using the FGSCAD-Net penalty (5) as proposed in Section 2.3.In the following, we demonstrate the E-step and M-step of the REM algorithm.

2.4.1 General procedures of REM algorithm

Let $\bm{\Phi}^{(m-1)}:=(\bm{\beta}^{(m-1)},(\bm{\sigma}^{2})^{(m-1)},\bm{\pi}^{(m-%1)})$ be the parameters updated at the $(m-1)$ th iteration.

E-step: With the parameter $\bm{\Phi}^{(m-1)}$ , we first compute the conditional expectation of $z_{ik}$ given the samples $\bm{Y}=\{Y_{i}(t_{is});i=1,\ldots,n,s=1,\ldots.S_{i}\}$ and $\bm{X}=\{\bm{X}_{i}(t_{is});i=1,\ldots,n,s=1,\ldots,S_{i}\}$

\omega_{ik}^{(m)}:=\mathbb{E}_{\bm{\Phi}^{(m-1)}}(z_{ik}\mid\bm{Y},\bm{X})=%\frac{{\pi_{k}^{(m-1)}}\sum_{s=1}^{S_{i}}f\left(Y_{i}(t_{is}),\bm{X}_{i}(t_{is%})^{T}\bm{\beta}_{\cdot k}^{(m-1)}(t_{is}),(\sigma_{k}^{2})^{(m-1)}\right)}{%\sum_{k=1}^{K}{\pi_{k}^{(m-1)}}\sum_{s=1}^{S_{i}}f\left(Y_{i}(t_{is}),\bm{X}_{%i}(t_{is})^{T}\bm{\beta}_{\cdot k}^{(m-1)}(t_{is}),(\sigma_{k}^{2})^{(m-1)}%\right)}.

(7)

After that, we calculate the expectation of the complete log-likelihood in (6) conditioning on $\bm{Y}$ and $\bm{X}$ , given the parameter $\bm{\Phi}^{(m-1)}$ . This leads to the Q function

\displaystyle Q(\bm{\Phi}\mid\bm{\Phi}^{(m-1)}):=\sum_{i=1}^{n}\sum_{k=1}^{K}%\omega_{ik}^{(m)}\sum_{s=1}^{S_{i}}\bigg{[}\text{log}\pi_{k}+\text{log}\left\{%f\left(Y_{i}(t_{is}),\{\bm{X}_{i}(t_{is})\}^{T}\bm{\beta}_{\cdot k}(t_{is}),%\sigma_{k}^{2}\right)\right\}\bigg{]}.

(8)

M-step: To facilitate the update of $\bm{\beta}$ , we incorporate the FGSCAD-Net penalty (5) into the Q function (8).

\displaystyle Q^{\text{pen}}\left(\bm{\Phi}\mid\bm{\Phi}^{(m-1)};\lambda,\rho,%r\right):=Q\left(\bm{\Phi}\mid\bm{\Phi}^{(m-1)}\right)-\bm{P}_{\text{FGSCAD-%Net}}\left(\bm{\beta};\lambda,\rho,r\right).

(9)

According to (9), we separately update the parameters $\bm{\beta}$ , $\bm{\sigma}^{2}$ , and $\bm{\pi}$ , given the current tuning parameters $\lambda,\rho,r$ . In detail, holding $\bm{\sigma}^{2}$ and $\bm{\pi}$ fixed at their previous values at the $(m-1)$ th iteration, we update $\bm{\beta}$ as

\displaystyle\bm{\beta}^{(m)}=\text{argmax}_{\bm{\beta}}{Q}^{\text{pen}}\left(%\left(\bm{\beta},(\bm{\sigma}^{2})^{(m-1)},\bm{\pi}^{(m-1)}\right)\mid\bm{\Phi%}^{(m-1)};\lambda,\rho,r\right).

(10)

Once obtaining $\bm{\beta}^{(m)}$ , we update $\bm{\pi}^{(m)}$ by

\pi_{k}^{(m)}=\frac{\sum_{n=1}^{N}\omega_{ik}^{(m)}}{N},\quad k=1,\ldots,K,

(11)

and update $(\bm{\sigma}^{2})^{(m)}$ by

(\sigma_{k}^{2})^{(m)}=\frac{\sum_{i=1}^{N}\omega_{ik}^{(m)}\sum_{s=1}^{S_{i}}%\left\{Y_{i}(t_{is})-\bm{X}_{i}(t_{is})^{T}\bm{\beta}_{\cdot k}^{(m)}(t_{is})%\right\}^{2}}{\sum_{i=1}^{N}S_{i}\omega_{ik}^{(m)}},\quad k=1,\ldots,K,

(12)

where $\omega_{ik}^{(m)}$ is defined in (7).Note that the main efforts for the above procedures lie in the optimization in (10). We demonstrate this process in detail in Sections 2.4.2.

2.4.2 Optimization of FGSCAD-Net Regularization

For the functional parameter $\bm{\beta}(t)$ , we parameterize its $(j,k)$ th element, which is $\beta_{jk}(t)$ ,by the cubic spline basis functions $\psi_{1}(t),\ldots,\psi_{L}(t)$ with equally spaced knots on $\mathcal{T}$ . We assume that

\beta_{jk}(t)=\bm{b}_{jk}^{T}\bm{\Psi}(t),

(13)

where $\bm{b}_{jk}=(b_{jk1},\ldots,b_{jkL})^{T}$ and $\bm{\Psi}(t)=(\psi_{1}(t),\ldots,\psi_{L}(t))^{T}$ .We then maximize (10) by substituting (13) into (9).We use $\bm{Q}^{T}\bm{Q}$ to denote the Cholesky decomposition of the non-negative definite matrix

\int_{\mathcal{T}}\bm{\Psi}(t)\{\bm{\Psi}(t)\}^{T}+r\bm{\Psi}^{{}^{\prime%\prime}}(t)\{\bm{\Psi}^{{}^{\prime\prime}}(t)\}^{T}\mathrm{d}t,

where $\bm{Q}$ is an upper triangular matrix.With this, we define $\bm{\alpha}_{j\cdot}=\left(\bm{\alpha}_{j1}^{T},\ldots,\bm{\alpha}_{jk}^{T}%\right)^{T}$ with $\bm{\alpha}_{jk}=\bm{Q}\bm{b}_{jk}$ and rewrite $\|\bm{\beta}_{j\cdot}\|_{r}$ as

	$\displaystyle\\|\bm{\beta}_{j\cdot}\\|_{r}$	$\displaystyle=\sqrt{\\|\bm{\beta}_{j\cdot}\\|^{2}+r\\|(\bm{\beta}_{j\cdot})^{{}^{%\prime\prime}}\\|^{2}}$
		$\displaystyle=\sqrt{\sum_{k=1}^{K}\int\bm{b}_{jk}^{T}\bm{\Psi}(t)\{\bm{\Psi}(t%)\}^{T}\bm{b}_{jk}+r\bm{b}_{jk}^{T}\bm{\Psi}^{{}^{\prime\prime}}(t)\{\bm{\Psi}%^{{}^{\prime\prime}}(t)\}^{T}\bm{b}_{jk}\mathrm{d}t}$
		$\displaystyle=\sqrt{\sum_{k=1}^{K}(\bm{Q}\bm{b}_{jk})^{T}\bm{Q}\bm{b}_{jk}}=\\|%\bm{\alpha}_{j\cdot}\\|,$

where we abuse the notation $||\cdot||$ to denote the Euclidean norm of a vector.We further denote $\bm{\alpha}_{\cdot k}=\left(\bm{\alpha}_{1k}^{T},\ldots,\bm{\alpha}_{pk}^{T}%\right)^{T}$ , $h_{ij}(t)=\left(X_{ij}(t)\psi_{1}(t),\ldots,X_{ij}(t)\psi_{L}(t)\right)^{T}\bm%{Q}^{-1}$ and $\bm{H}_{i}(t)=(h_{i1}(t)),\ldots,h_{ip}(t))^{T}$ .Then we can transform the optimization (10) into the standard group SCAD- $L_{2}$ optimization problem (Zeng andXie, 2014) as follows

\displaystyle(\widetilde{\bm{\alpha}}_{\cdot 1}^{(m)},\ldots,\widetilde{\bm{%\alpha}}_{\cdot K}^{(m)})=\text{argmax}_{\bm{\alpha}_{\cdot 1},\ldots,\bm{%\alpha}_{\cdot K}}Q(\bm{\Phi}\mid\bm{\Phi}^{(m-1)})-\sum_{j=1}^{p}\left\{\bm{P%}_{\text{SCAD}}\left(\|\bm{\alpha}_{j\cdot}\|;\lambda\right)+\rho\|\bm{\alpha}%_{j\cdot}\|^{2}\right\},

(14)

where $Q(\bm{\Phi}\mid\bm{\Phi}^{(m-1)})$ can be expressed in terms of $\{\bm{\alpha}_{\cdot k};k=1,\ldots,K\}$ by

Q(\bm{\Phi}\mid\bm{\Phi}^{(m-1)})=\sum_{i=1}^{n}\sum_{k=1}^{K}\omega_{ik}^{(m)%}\sum_{s=1}^{S_{i}}\left[\text{log}\pi_{k}^{(m-1)}+\text{log}\left\{f\left(Y_{%i}(t_{is}),\{\bm{H}_{i}(t_{is})\}^{T}\bm{\alpha}_{\cdot k},(\sigma_{k}^{2})^{(%m-1)}\right)\right\}\right].

This optimization problem can be efficiently solved by the group coordinate descent algorithm (Breheny andHuang, 2015).

Since the estimators involved with $L_{2}$ terms are generally biased (Zou andZhang, 2009; Zeng andXie, 2014), we conduct a bias correction for $\widetilde{\bm{\alpha}}_{\cdot k}^{(m)}$ .In detail, we use the following bias correction formula to obtain the final estimation of $\bm{\alpha}_{\cdot k}^{(m)}$ (Zeng andXie, 2014)

{\bm{\alpha}}_{\cdot k}^{(m)}=\left\{1+2(1-\rho)\lambda\right\}{\widetilde{\bm%{\alpha}}}_{\cdot k}^{(m)}.

(15)

Once obtaining $\bm{\alpha}_{\cdot k}^{(m)}$ , we compute $\bm{\beta}_{\cdot k}^{(m)}(t)$ by

\bm{\beta}_{\cdot k}^{(m)}(t)=\bm{Q}^{-1}\bm{\alpha}_{\cdot k}^{(m)}\bm{\Psi}(%t).

(16)

2.4.3 Tuning Strategy

The REM algorithm requires tuning of three hyperparameters $\lambda$ , $\rho$ , and $r$ .The traditional strategy for choosing $\lambda$ , $\rho$ , and $r$ needs to run the entire algorithm for all candidate combinations of $(\lambda,\rho,r)$ . This is computationally inefficient since there are numerous choices of candidate combinations that need to be examined.Inspired by the path-fitting algorithm (Breheny andHuang, 2015) for a fast tunning of $\lambda$ , we modify the traditional tuning strategy for the REM algorithm.Instead of running a complete REM for each candidate of $\lambda$ , we propose to tune $\lambda$ in each M-step of the REM algorithm; refer Part B.2 in Supplementary Material for the implementation details.By this approach, we can employ a path-fitting algorithm to select $\lambda$ for accelerating the tuning process (Huangetal., 2009; Lietal., 2016; Caietal., 2019).

For the hyperparameters $\rho$ and $r$ , we run an entire REM for each of their candidate combinations, nested with the aforementioned strategy for tuning $\lambda$ .We select $\rho$ and $r$ from the optional values based on the AIC

\text{AIC}(\rho,r)=-2l(\widehat{\bm{\Phi}})+2\text{df},

(17)

where $\widehat{\bm{\Phi}}$ is the converged parameters of the REM algorithm given the current choices of $\rho$ and $r$ , and df is the degree of freedom determined as in Breheny andHuang (2015).

In addition, we need to provide the group number $K$ for the implementation of the REM algorithm.To determine $K$ , we similarly treat it as a tuning parameter, and proceed with its selection by minimizing the BIC

\text{BIC}(K)=-2l(\bm{\Phi}_{K})+\text{df}(\lambda)\cdot\text{log}(\sum_{i=1}^%{n}S_{i}),

(18)

where $\bm{\Phi}_{K}$ is the converged parameters when the number of clusters is $K$ , with $\rho$ and $r$ selected based on (17).The complete REM algorithm with the tuning processes are summarized in Algorithm 1 in Part B.1 of Supplementary Material.

3 Simulation

In this section, we conduct numerical simulations to assess the performances of the proposed method in Section 2, in comparison to other competing methods across three aspects: clustering, variable selection, and parameter estimation.To begin with, we generate the functional covariates $\bm{X}_{i}(\cdot)$ , $i=1,\ldots,n$ , as

\bm{X}_{i}(t)=\sum_{l=1}^{4}\bm{\theta}_{il}\psi_{l}(t),\ \forall t\in[0,1],

where $\psi_{1}(\cdot),\ldots,\psi_{4}(\cdot)$ are the first four nonconstant Fourier basis functions, and $\bm{\theta}_{il}\in\mathbb{R}^{p}$ , for each $i$ and $l$ , is a random vector sampled from a mean-zero Gaussian distribution with the covariance matrix $\left\{l^{-2}\alpha^{|j-k|}\right\}_{1\leq j,k\leq p}\in\mathbb{R}^{p\times p}$ .Here, the parameter $\alpha$ controls the dependence between covariates, with a higher value indicating stronger dependencies.

To generate the functional coefficients attaining the cluster-invariant sparsity,we set $\bm{\beta}_{\cdot k}(t)\in\mathbb{R}^{p}$ as

\bm{\beta}_{\cdot k}(t)=(f_{1k}(t),f_{1k}(t),f_{2k}(t),f_{2k}(t),f_{3k}(t),f_{%3k}(t),0,\ldots,0)^{T},\quad k=1,\ldots,K,\ t\in[0,1],

where $f_{jk}(t)=f_{jk}^{*}(t)/\big{\|}f_{jk}^{*}\big{\|}_{2}$ , and $f_{jk}^{*}(\cdot)$ s are given by

$\displaystyle f_{11}^{*}(t)=\text{sin}(\frac{\pi t}{2}+\frac{3}{2}\pi)-t-\frac%{1}{2},\quad$	$\displaystyle f_{12}^{*}(t)$	$\displaystyle=\big{\{}\text{cos}(2\pi t)-1\big{\}}^{2},\quad$	$\displaystyle f_{13}^{*}(t)$	$\displaystyle=-f_{11}^{*}(t)+1,$
$\displaystyle f_{21}^{*}(t)=\text{sin}(2\pi t)-t+0.5,\quad$	$\displaystyle f_{22}^{*}(t)$	$\displaystyle=\text{sin}(\frac{\pi t}{2}+\pi),\quad$	$\displaystyle f_{23}^{*}(t)$	$\displaystyle=-f_{21}^{*}(t)-0.5,$
$\displaystyle f_{31}^{*}(t)=-\text{sin}(\frac{\pi t}{2}+\frac{3\pi}{2})-t-0.5,\quad$	$\displaystyle f_{32}^{*}(t)$	$\displaystyle=-f_{12}^{*}(t),\quad$	$\displaystyle f_{33}^{*}(t)$	$\displaystyle=f_{11}^{*}(t)+t+0.5.$

These functions are presented in Figure 2 in Supplementary Material.Note that the relevant covariates for all clusters are $X_{i1},\ldots,X_{i6}$ , each of which makes an equal contribution to $Y_{i}$ since $\|f_{jk}\|_{2}=1$ for $j=1,\ldots,6$ and $k=1,\ldots,K$ .We set $K=3$ .

Next, we generate the cluster membership $Z_{i}$ from a multinomial distribution as described in Section 2, with $\pi_{k}=1/K,\ k=1,\ldots,K$ .Providing $\bm{X}_{i}(t),Z_{i}$ , and $\bm{\beta}(t)$ , $Y_{i}(t)$ is generated from the model (1) for $t$ contained in $10$ equally spaced knots on $[0,1]$ ,where the variance of the error term $\sigma_{k}^{2}$ is taken according to the signal-to-noise ratio (SNR)

\sigma_{k}^{2}=\left\{\frac{\sum_{i=1}^{n}\int_{\mathcal{T}}\sum_{k=1}^{K}I(Z_%{i}=k)\left\{\bm{X}_{i}(t)^{T}\bm{\beta}_{\cdot k}(t)\right\}^{2}\mathrm{d}t}{%n}\right\}\ \bigg{/}\ \text{SNR}.

We set the SNR as 12.Based on this setting, we evaluate the method proposed in Section 2, which is abbreviated as FGS-Net due to the use of FGSCAD-Net penalty (5).We compare FGS-Net by other REM methods penalized with the FSCAD-Net penalty (4), and FSCAD penalty (3), abbreviated as FS-Net and FS in the following.Apart from these three methods, we examine other competing methods for clustering longitudinal associations.

•
RP: This method incorporates a roughness penalty (RP) into the REM algorithm, which is given as $\sum_{j=1}^{p}\sum_{k=1}^{K}\lambda\|\beta_{jk}^{{}^{\prime\prime}}\|_{2}^{2}$ .
•
VS-RP: This method first utilizes the FGS-Net with $K=1$ to conduct the variable selection (VS), where the clustering is not performed at this stage. After that, we adopt the selected variables to implement the RP method for clustering.
•
LI-MIX:This method fits the data by a finite mixture of linear regression models (LI-MIX), i.e., the functional coefficient $\beta_{jk}(t)$ in (1) is treated as a constant over $t$ .To preform this method, we first conduct a variable selection using ordinary linear regression models shrunk with an elastic-net penalty (Zou andZhang, 2009). After that, we adopt the selected covariates to fit a finite mixture of linear regression models for the clustering (Khalili andChen, 2007).

The RP is a simplification of the FS-Net with $\rho=0$ , which only regularizes the roughness of each $\beta_{jk}$ and does not yield sparsity.VS-RP and LI-MIX are two two-step approaches, which implement the variable selection and clustering orderly.These two-step methods are more simple than the aforementioned FGS-Net, FS-Net, FS, and RP.However, selecting relevant covariates prior to the clustering procedure may raise additional problems, as the clustering performance may be sensitive to the outcome from the variable selection.It’s worth noting that LI-MIX further ignores the time-varying nature of $\beta_{jk}$ .

For each scenario with different combinations of $n$ , $p$ , and $\alpha$ , the simulations are repeated 100 times.We adopt random initialization for the FGS-Net, FS-Net, and FS methods, i.e., set $\omega_{ik}^{(0)}=I(Z_{i}^{(0)}=k)$ , where $Z_{i}^{(0)}$ is sampled from a multinomial distribution as described in Section 2, with $\pi_{k}=1/K$ , $k=1,\ldots,K$ .On the other hand, RP, VS-RP, and LI-MIX are initialized with the actual cluster membership $\omega_{ik}^{(0)}=I(Z_{i}=k)$ .To alleviate computation burdens, we only use one initialization for each simulation.Moreover, the $K$ in RP, VS-RP, and LI-MIX is fixed to 3, the true number of clusters.The $K$ for FGS-Net, FS-Net, and FS are selected based on (18) in Section 2.4.3.

The performance of clustering, variable selection, and parameter estimation is evaluated based on the following criteria.

•
Clustering accuracy is evaluated using the adjusted Rand Index (ARI, Rand, 1971).The ARI is bounded by $\pm 1$ to measures the similarity between the true cluster membership and the estimated cluster membership.A higher ARI represents a better clustering result.

•

Variable selection performance is evaluated using C and IC, where C is the number of zero coefficients that are correctly estimated to zero

\text{C}=\sum_{j=7}^{p}\sum_{k=1}^{K}I(\|\hat{\beta}_{jk}\|_{2}=0),

where $\hat{\beta}_{jk}$ is the estimate of $\beta_{jk}$ . Similarly, IC is the number of nonzero coefficients that are incorrectly estimated to zero

\text{IC}=\sum_{j=1}^{6}\sum_{k=1}^{{K}}I(\|\hat{\beta}_{jk}\|_{2}=0).

•

The parameter estimation accuracy is measured using the standardized mean square error (MSE) of the functional coefficients, which is defined as

\text{MSE}=\frac{\sum_{k=1}^{K}\sum_{j=1}^{p}\|\beta_{jk}-\hat{\beta}_{jk}\|_{%2}^{2}}{\sum_{k=1}^{K}\sum_{j=1}^{p}\|\beta_{jk}\|_{2}^{2}}.

$n=180$
		$p$ =10				$p$ =30				$p$ =100				$p$ =240
	Model	ARI	C	IC	MSE	ARI	C	IC	MSE	ARI	C	IC	MSE	ARI	C	IC	MSE
$\alpha=0.4$	Truth	1	12	0	0	1	72	0	0	1	252	0	0	1	702	0	0
	RP	1.00	0.00	0.00	0.04	0.01	0.00	0.00	1.00	0.00	0.00	0.00	1.00	0.01	0.00	0.00	1.00
	VS-RP	0.45	11.94	8.82	0.70	0.43	71.70	8.85	0.72	0.39	251.34	9.06	0.76	0.33	700.23	9.09	0.79
	LI-MIX	0.28	4.89	2.31	0.72	0.23	37.05	2.52	0.79	0.06	125.88	2.97	1.35	0.01	370.83	3.12	2.81
	FS	1.00	12.00	0.06	0.03	1.00	72.00	0.00	0.03	0.97	251.93	0.45	0.09	0.93	701.86	1.09	0.13
	FS-Net	1.00	12.00	0.07	0.03	1.00	72.00	0.00	0.02	0.97	251.96	0.50	0.09	0.94	701.93	0.96	0.12
	FGS-Net	1.00	12.00	0.00	0.03	1.00	72.00	0.00	0.02	0.99	252.00	0.00	0.05	0.98	702.00	0.18	0.08
$\alpha=0.8$	Truth	1	12	0	0	1	72	0	0	1	252	0	0	1	702	0	0
	RP	1.00	0.00	0.00	0.08	0.01	0.00	0.00	0.99	0.01	0.00	0.00	1.00	0.01	0.00	0.00	1.00
	VS-RP	0.86	11.73	9.60	0.82	0.85	71.31	9.54	0.82	0.84	251.22	9.87	0.85	0.81	699.75	11.22	0.97
	LI-MIX	0.22	6.15	4.05	0.87	0.18	42.60	4.29	1.05	0.08	150.27	4.35	1.85	0.02	437.73	4.38	5.15
	FS	1.00	11.92	1.42	0.19	1.00	70.60	1.82	0.22	0.98	248.59	3.35	0.36	0.97	701.91	8.51	0.77
	FS-Net	1.00	11.91	0.34	0.11	1.00	70.88	0.48	0.12	0.98	249.85	1.36	0.21	0.98	701.92	6.41	0.55
	FGS-Net	1.00	12.00	0.03	0.09	1.00	72.00	0.03	0.09	0.98	251.97	0.21	0.15	0.99	701.91	0.72	0.16
$n=300$
		$p$ =10				$p$ =50				$p$ =150				$p$ =400
	Model	ARI	C	IC	MSE	ARI	C	IC	MSE	ARI	C	IC	MSE	ARI	C	IC	MSE
$\alpha=0.4$	Truth	1	12	0	0	1	132	0	0	1	432	0	0	1	1182	0	0
	RP	1.00	0.00	0.00	0.04	0.01	0.00	0.00	1.00	0.01	0.00	0.00	1.00	0.01	0.00	0.00	1.00
	VS-RP	0.83	11.97	6.21	0.42	0.82	131.73	6.36	0.43	0.82	431.34	6.45	0.43	0.85	1179.78	5.82	0.39
	LI-MIX	0.33	4.56	0.96	1.12	0.27	63.45	1.11	1.18	0.10	229.26	1.29	1.44	0.01	648.66	1.50	2.90
	FS	0.99	12.00	0.06	0.03	1.00	132.00	0.00	0.01	1.00	432.00	0.00	0.01	0.99	1181.97	0.06	0.04
	FS-Net	0.99	12.00	0.06	0.03	1.00	132.00	0.00	0.01	1.00	432.00	0.00	0.01	0.99	1181.97	0.06	0.03
	FGS-Net	0.99	12.00	0.00	0.05	1.00	132.00	0.00	0.01	1.00	432.00	0.00	0.04	0.99	1182.00	0.00	0.02
$\alpha=0.8$	Truth	1	12	0	0	1	132	0	0	1	432	0	0	1	1182	0	0
	RP	1.00	0.00	0.00	0.07	0.01	0.00	0.00	0.99	0.01	0.00	0.00	1.00	0.01	0.00	0.00	1.00
	VS-RP	0.91	11.76	9.12	0.76	0.91	131.10	9.18	0.78	0.89	429.78	9.06	0.77	0.87	1176.21	9.18	0.79
	LI-MIX	0.24	6.87	3.36	1.14	0.19	77.67	3.27	1.28	0.09	261.63	3.15	1.79	0.02	768.21	3.66	4.08
	FS	0.99	11.99	0.25	0.09	0.98	131.69	0.33	0.09	1.00	430.90	0.05	0.06	0.96	1181.96	8.05	0.75
	FS-Net	0.99	12.00	0.09	0.06	0.99	131.74	0.20	0.08	1.00	431.13	0.02	0.05	0.97	1181.95	5.90	0.50
	FGS-Net	0.99	12.00	0.06	0.09	0.99	132.00	0.09	0.07	0.99	432.00	0.06	0.09	0.99	1182.00	0.66	0.12

We investigate performances of the above methods under various scenarios of $n$ , $p$ , and $\alpha$ . Here, we set $n$ to $180$ or $300$ , and take $p$ as $10,\ n/6,\ n/2,$ and $3n/4$ , to consider the situations ranging from a small to a large number of covariates. Additionally, we set $\alpha$ to $0.4$ and $0.8$ to reflect the mild or strong dependence among the covariates.The averaged ARI, C, IC, and MSE are presented in Table 1.In the analysis below, we only focus on the results of $n=180$ . Similar conclusions can be obtained from the result of $n=300$ .

Overall, FGS-Net, FS-Net, and FS show superior performance under different scenarios of $n$ , $p$ , and $\alpha$ , highlighting the advantages of implementing variable selection and clustering simultaneously under the REM framework.In contrast, the RP method only uses a roughness penalty and does not consider variable selections within the clustering. As a result, its performance quickly deteriorates for both clustering and parameter estimation as $p$ increases.Furthermore, among the two-step methods, we find that the VS-RP performs poorly compared to the first three REM-type methods. For instance, in all scenarios with $n=180$ , the average ICs of VS-RP are mostly larger than 9, indicating that about half of the nonzero functional coefficients are incorrectly identified as zero.This leads to significant estimation errors in both clustering and parameter estimation by the VS-RP method, and suggests that selecting variables before clustering is ineffective. It’s worth noting that the results of LI-MIX are even worse, as this method further ignores the time-varying nature of $\bm{\beta}(\cdot)$ in the estimation procedure.

In Table 1, we also observe that the ICs and MSEs of FS are significantly larger than those of FS-Net as $\alpha$ increases. This is expected since the dependencies between functional covariates may impede the performance of the FS procedure. This requires an additional ridge-type penalty in the FS-net to stabilize the estimation procedure.Moreover, as $p$ increases, the ICs and MSEs of FS-Net are further larger than those of FGS-net. For these cases, the high-dimensionality would undermine the statistical efficiency for both variable selection and clustering in the FS-Net. Therefore, it would be beneficial to impose cluster-invariant sparsity through FGS-net to borrow strengths across all clusters.

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US (3)

In the case of $n=180$ , $p=240$ , and $\alpha=0.8$ , we further illustrate the estimation performance of the functional coefficients in Figure 3.We observe that the estimations of FS-Net exhibit smaller biases compared to FS, highlighting the significance of FS-Net in mitigating biases of functional coefficients.Furthermore, in comparison to those of FS-Net and FS, the estimated curves of FGS-Net show even smaller biases and narrower confidence bands, owing to the pursuit of cluster-invariant sparsity.Overall, FGS-Net is the most suitable choice among these six methods for clustering longitudinal associations and selecting important variables in high-dimensional functional covariates.

4 Real data

In this section, we apply our method to the SDOH and stroke mortality dataset for the clustering of their longitudinal associations. Given that the stroke mortality data are right-skewed and take positive values, we conduct a log transformation to stabilize its variance. Using our approach, we identify 2 clusters for the longitudinal associations and 18 relevant SDOH covariates for stroke mortality.

The two clusters for the county-level longitudinal associations are presented in Figure 4.The proportions of two clusters, determined by the number of counties, stand at 68% and 32%, respectively. Notably, both clusters are prevalent across the majority of states in the US, encompassing both rural and urban areas (urban: 76% and 24%, and rural: 65% and 35%, for cluster 1 and cluster 2, respectively).It’s worth noting that the southeastern US contains a region called the Stroke Belt, known for its persistent high relative excess of stroke mortality. Despite counties in the Stroke Belt having similar stroke severity, this area is also mixed by the two clusters, with proportions of 70% and 30%, respectively. These results suggest that regions sharing similar geographic and stroke characteristics may have very different SDOH-stroke mortality associations, and we may need to consider separating two types of policies for the SDOH adjustments in stroke management based on our clustering results.

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US (4)

In addition, we illustrate the selected 18 covariates of SDOH in Table 4, ordered by their relative importance (Grömping, 2007) defined as

\text{RI}(\bm{\beta}_{j\cdot})=\|\bm{\beta}_{j\cdot}\|_{2}\left\{\frac{1}{n}%\sum_{i=1}^{n}\big{\|}X_{ij}-\frac{1}{n}\sum_{i=1}^{n}X_{ij}\big{\|}^{2}\right%\}^{1/2}.

We find that the influence of the SDOH on stroke mortality is mainly contributed by four aspects: social environment, built environment, health care system, and biology.Beyond the well-studied determinants from economic, cultural, and racial domains (Tsaoetal., 2023), we find that stroke mortality is significantly associated with living and working environments, education level, and overuse of opioids.Typically, among the selected variables of SDOH, most of them are related to economic development.For example, in Table 4,MEDIAN_HOME_VALUE may reflect overall economic development and infrastructure building in the community.Additionally, a higher value of ELDERLY_RENTER may suggest a larger elderly population with lower income, facing issues such as housing instability.These economic-related factors may be potentially addressed through more equitable economic policies.For example, MEDIAN_HOME_VALUE can be adjusted by facilitating economic development. In addition, serving as a sign of the elderly living condition, ELDERLY_RENTER can be adjusted by improving elderly welfare.

	$\displaystyle\\|\bm{\beta}_{j\cdot}\\|_{r}$	$\displaystyle=\sqrt{\\|\bm{\beta}_{j\cdot}\\|^{2}+r\\|(\bm{\beta}_{j\cdot})^{{}^{%\prime\prime}}\\|^{2}}$
		$\displaystyle=\sqrt{\sum_{k=1}^{K}\int\bm{b}_{jk}^{T}\bm{\Psi}(t)\{\bm{\Psi}(t%)\}^{T}\bm{b}_{jk}+r\bm{b}_{jk}^{T}\bm{\Psi}^{{}^{\prime\prime}}(t)\{\bm{\Psi}%^{{}^{\prime\prime}}(t)\}^{T}\bm{b}_{jk}\mathrm{d}t}$
		$\displaystyle=\sqrt{\sum_{k=1}^{K}(\bm{Q}\bm{b}_{jk})^{T}\bm{Q}\bm{b}_{jk}}=\\|%\bm{\alpha}_{j\cdot}\\|,$