研究成果:開發運用人工智慧篩選可降解新興環境有機污染物如阻燃劑微生物之新穎技術

3 3 月

開發運用人工智慧篩選可降解新興環境有機污染物如阻燃劑微生物之新穎技術
Develop a novel methodology to screen microbial degraders for emerging contaminants such as flame retardants by artificial intelligence

計畫主持人:臺大農化系-施養信、中研院資訊所-林仲彥

Graphical Abstract: Developing AI model to identify novel bacteria with the potential degradation of HBCD.

Sequencing technology development has advanced significantly in recent years due to the tremendous advancements in molecular biotechnology. As a result, we can now view the microbiome from a deeper perspective, from screening cultures to deciphering functional genes or DNA. Traditional techniques like microscope observation and microbial cultivation have become more time-consuming and labor-intensive due to high-throughput sequencing technology’s growing volume of microbial data. Thus, this study used machine learning to screen degraders of emerging environmental organic contaminants, such as flame retardants, from the microbial genomes and proteomes database.

First, the HBCD degrader was identified through the alignment of protein sequences. Ralstonia solanacearum was selected for harboring 2-haloacid dehalogenase (HADs, EC:3.8.1.2) and haloacetate dehalogenase (EC:3.8.1.3). R. solanacearum KMRS showed remarked effects on HBCD removal. The R. solanacearum KMRS culture degraded 41% of HBCD within 12 days of incubation and 69% of removal after 36 days of incubation.

Secondly, we also observed the compositional alteration in the soil microbial community in response to the HBCD treatment through Nanopore sequencing and metagenomic analysis. Several novel bacterial taxa were identified to contribute to HBCD biotransformation by comparing relative abundances and performing network analyses.

Finally, based on the above preliminary result, we extracted the bacterial genomes with/without the ability to degrade HBCD for the prediction model in AI. One hundred species were introduced as positives, and 90 as negatives were introduced for the model training. By taking those encoding data as input from those species with complete genomes, we encoded those genomes in DeepEC which takes whole protein sequences of specific species as input and predicts Enzyme Category (EC) numbers as output for model training. Due to the limited training dataset, we took XGBoost and SVM algorithms to evaluate the performance first. A convincing result was obtained by validating the SVM boolean-based classifier on an independent test dataset. Accordingly, the ACC, SP, MCC, and F1 scores were 0.8, 0.8, 0.6, and 0.8, respectively, demonstrating the SVM boolean classifier was a reliable predictor.

Combining a metagenomic approach and machine learning, we managed to fill the knowledge gap of HBCD biotransformation in natural soil environments and provide a timesaving screening procedure. It was the first study to utilize a metagenomics dataset to develop a predictive model for bio-degraders of flame retardants. We believe the model we built in this study may have extensive applications on other emerging flame retardants, such as phosphorus-based flame retardants or other persistent organic pollutants harmful to the environment.

近年來,分子生物技術領域的重大進展加速了微生物定序技術 (microbial sequencing) 的開發,讓我們觀察微生物相的角度,從過往的培養盤篩選,到現今得以透過功能基因或是DNA 序列進行研究分析,而微生物體 (microbiome) 即是以「基因體」的角度來論述微生物相,這使得微生物體 (microbiome) 在許多的研究中逐漸受到重視。以高通量定序技術 (Highthroughput sequencing technology) 所產生的微生物數據越來越龐大,導致過去使用顯微鏡觀察和微生物培養等傳統研究方法,更顯得勞力密集且費時。因此,本計畫利用微生物基因體與蛋白質體之資料庫,藉由機器學習 (machine-learning, ML)的方式來篩選可降解新興環境有機污染物,如阻燃劑 (flame retardants, FRs) 之菌株或微生物相,以加速傳統篩選微生物的方法,並應用於未來環境微生物學的研究當中。

在建立篩選可降解阻燃劑hexabromocyclododecane (HBCD)模型的過程中,我們首先嘗試透蛋白質序列比對。Ralstonia solanacearum 因為含有 2-鹵代酸脫鹵素酶 (HADs,EC:3.8.1.2) 和鹵代乙酸脫鹵素酶 (EC:3.8.1.3)這兩個和HBCD降解有關的蛋白質而被篩選進行降解試驗。 根據降解試驗結果顯示,R. solanacearum KMRS 有顯著的降解HBCD 的能力。R. solanacearum KMRS 在培養 12 天內降解了 41% 的HBCD,在培養 36 天內總共去除了 69%的HBCD。

其次,透過Nanopore 進行全基因體定序所得到的metagenomic 分析,我們觀察到土壤環境中微生物菌群的組成在HBCD 處理後的變化,透過比較相對豐富度和網絡分析,我們確定了特定微生物品系有助於HBCD 在土壤環境中的降解,並以同一實驗中,其微生物族群消長與HBCD 降解與否無關的微生物基因體為控制組,透過全基因體酵素組成的類型數目及有無等編碼方式,結合XGBoost 和SVM 這兩種機器學習,建構了可以預測HBCD 降解菌株的模型,準確率約為七成,但以文獻發表過已知可降解HBCD的菌株來驗證,其預測準確率提昇至八成。後續的資料清洗與模型優化將持續進行,預期將能提高整體的預測能力。

本研究結合了metagenomic 的分析和機器學習,我們設法更了解土壤環境中HBCD 生物轉化的相關菌株, 並提供了一種更為省時篩選可降解菌株的程序。這是第一個利用metagenomic 分析結果,結合人工智慧方法來開發阻燃劑生物降解劑預測模型的研究,相信我們在本研究中建立的模型架構可以廣泛應用於其他新興的阻燃劑的除污研究上,例如磷系阻燃劑等,並能加速找到具降解能力的特定菌群,將能有助於找到除去其他對環境有害的持久性有機污染物的解決方案。