Home > Techniques > AI > 【笔记整理|2024-07】高性能分子动力学模拟优化策略:GPU并行与多节点配置详解

【笔记整理|2024-07】高性能分子动力学模拟优化策略:GPU并行与多节点配置详解
gromacs gpu-performance molecular-dynamics high-performance-computing parallel-computing simulation-optimization technical-notes

【笔记整理|2024-07】高性能分子动力学模拟优化策略:GPU并行与多节点配置详解

引言

分子动力学模拟是计算化学和生物物理学中的重要工具,随着系统规模的扩大和计算精度的提高,对计算资源的需求也越来越大。本文整理了从QQ技术讨论中提取的关于GROMACS分子动力学模拟性能优化的关键技术和实践经验,重点关注GPU并行计算、多节点配置和性能调优策略。

GPU优化与并行计算

多GPU配置策略

在使用多个GPU进行分子动力学模拟时,性能优化需要考虑通信开销和计算效率的平衡:

As before, the scaling when going from one GPU to two is not linear. This is expected: GPUs now don’t have as much to compute and they have to communicate between each other. To add to that, the communications can not be easily hidden behind the computations. To make the best use of the resources, ensemble runs can be executed. Try to use multi-dir approach as we did before, to see what configuration will give you the best cumulative performance. Try to assign more than one rank to a single GPU. This will allow to overlap communications, CPU and GPU execution more efficiently. Try to leave bonded computation and/or update constraints to the CPU: you have 10 CPU core per single GPU and it would be a waste to keep them idle.

多GPU配置示例:

Run GROMACS using 4 GPUs (with IDs 0,1,2,3). Here we use 2 thread-MPI tasks per GPU (-ntmpi 8), which we find gives good performance. We set 16 OpenMP threads per thread-MPI task (assuming at least 128 CPU cores in the system). These can be adjusted to map to any specific hardware system, and experimented with for best performance…

动态负载平衡

动态负载平衡是GROMACS中的一个重要优化特性:

动态负载平衡默认开启(-dlb auto),可显式指定 -dlb yes,以在粒子分布不均或相互作用强度不同的情况下动态调整域大小。需要注意的是,在GPU常驻模式(使用-update gpu)时,动态负载平衡会被关闭

PME性能调优

PME(Particle Mesh Ewald)方法是计算长程静电相互作用的重要算法,GROMACS提供了自动调优功能:

The PME tuning is on by default whenever it is likely to be useful, can be forced on with gmx mdrun -tunepme, and forced off with gmx mdrun -notunepme. In practice, mdrun does such tuning in the first few thousand steps, and then uses the result of the optimization for the remaining time.

Given that GROMACS already had a fast CPU implementation, moving the biggest workload to the GPU provides the best parallelism.

温度控制与采样策略

高温增强采样

在分子动力学模拟中,提高温度可以增强构象采样效率:

High temperatures increase the kinetic energy but do not directly alter the nonbonded interaction parameters (e.g., van der Waals forces, electrostatics) defined by the force field. The force field parameters remain consistent, meaning the fundamental interactions governing molecular behavior are not artificially distorted by temperature alone.

High temperatures increase the kinetic energy of the system, allowing it to overcome energy barriers and explore a broader conformational space.

温度对构象采样的影响:

try a 1000K protein to make it denature

The simulations at 500 and 800 K both generated conformations that minimized to energies 200 kcal/mole lower than the crystal structure. However, the 1500 K simulation produced higher energy structures, even after minimization; in addition, this highest temperature run had many cis-trans peptide isomerizations. This suggests that 1500 K is too high a temperature for unconstrained conformational sampling.

退火策略

退火是一种通过逐渐改变系统温度来优化构象的技术:

The annealing is implemented by simply changing the current reference temperature for each group in the temperature coupling, so the actual relaxation and coupling properties depends on the type of thermostat you use and how hard you are coupling it.

距离计算与相互作用分析

距离计算工具

GROMACS提供了多种距离计算工具用于分析分子间相互作用:

gmx distance -s 2beg_pull.tpr -f 2beg_pull.xtc -n protein.ndx -oall 2beg_pull_dist.xvg -select ‘com of group “Chain_A” plus com of group “Chain_B”’

gmx mindist computes the distance between one group and a number of other groups. Both the minimum distance (between any pair of atoms from the respective groups) and the number of contacts within a given distance are written to two separate output files.

注意事项:

gmx distance expects the selections to have an even number of positions, meaning pairs of atoms to calculate the distances between.

-select ‘com of group “first” plus com of group “last”’: This command calculates the center of mass (COM) of the group first and last and the distance between these centers.

径向分布函数(RDF)计算

径向分布函数是研究液体结构和分子间相互作用的重要工具:

To compute the RDF around axes parallel to the z-axis, i.e., only in the x-y plane, use -xy.

软核相互作用与自由能计算

软核势能函数

在自由能计算中,软核相互作用用于避免粒子消失时的奇点问题:

Direction-periodic should only be used for cases where you want to pull over distances of more than half the box length. Such cases are very uncommon. Pulling a large polymer could be a valid use case. With an NVT simulation things should be fine. But you probably want to pull to a distance of slightly less than the full box size to avoid interactions between periodic images.

软核相互作用的详细信息: https://manual.gromacs.org/current/reference-manual/functions/free-energy-interactions.html#soft-core-interactions-beutler-et-al

构建辅助工具与拓扑处理

psfgen构建工具

VMD的psfgen是一个强大的分子拓扑构建工具,但也存在一些需要注意的问题:

vmd modeling is stupid: residue 5 is a normal residue that contains BOND C +N, while residue 6 does not include N (but NC) atom. so vmd creates a bond between residue 5 C and the last atom (PHE HE2B)??? how to fix?

it depends on the residue pair: it seems to try to use the coordinates of existing atoms (residue before mutation), and apply IC for the rest.

the most common error is a misreplacement (exchange) of C and H connected to the same Carbon (while the Hs on the C might be right or wrong…). sometimes only terminal Hs are wrong (centered on another atom?) I still don’ t know why

内坐标与拓扑生成

在内坐标(IC)生成过程中,需要注意键角和二面角的自动生成:

Both angles and dihedrals are generated automatically unless “auto none” is added

36 1 makes vmd output “psfgen) Created by CHARMM version 36 1”

资源管理与作业调度

SLURM作业管理

在使用SLURM作业调度系统时,合理配置资源请求和节点选择非常重要:

#SBATCH –exclude=node4,node5,node7,node8,node9

we can only specify one for –nodelist, but #SBATCH –exclude=node[1-16] works

作业提交与管理: https://bioinformaticsworkbook.org/Appendix/HPC/SLURM/submitting-dependency-jobs-using-slurm.html#gsc.tab=0

性能监控与调试

GPU利用率监控

监控GPU使用情况对于性能优化非常重要:

https://stackoverflow.com/questions/40937894/nvidia-smi-volatile-gpu-utilization-explanation

GROMACS性能调试

通过分析GROMACS的输出信息可以了解性能瓶颈:

Note the following line in the gmx mdrun output:

总结与建议

  1. 多GPU配置:合理配置GPU数量和CPU核心分配,平衡计算和通信开销
  2. 动态负载平衡:在非GPU常驻模式下启用动态负载平衡以优化性能
  3. 温度控制:根据具体研究需求选择合适的温度策略,高温有助于构象采样但可能破坏蛋白质结构
  4. 距离分析:熟练使用gmx distance和gmx mindist等工具进行分子间相互作用分析
  5. 拓扑构建:注意VMD建模中的常见问题,合理设置内坐标生成参数
  6. 资源管理:合理配置SLURM作业参数,优化计算资源使用

通过这些优化策略,可以显著提高分子动力学模拟的计算效率和结果的准确性。

参考资源