原文地址：PINN：深度学习框架下求解含有非线性偏微分方程的正问题、反问题
doi:10.1016/j.jcp.2018.10.045

Data-driven discovery of partial differential equations

在我们研究的当前部分，我们将注意力转移到数据驱动的偏微分方程发现问题上^[5]^[9]^[14]。在第$4.1$节和第$4.2$节中，我们提出了两种不同类型的算法，即连续时间模型和离散时间模型，并通过各种规范问题(canonical problems)的视角强调了它们的性质和性能。

4.1 连续时间模型

首先，让我们回忆方程$1$并和$3.1$节相类似定义$f(t,x)$由公式$1$左侧给定，即

$f:=u_t+\mathcal{N}[u:\lambda]\tag{14}$

我们继续使用深度神经网络近似$u(t, x)$。此假设以及公式$14$构建了一个PINN。该网络可以通过使用自动微分来对方程的不同部分应用链式法则推导出来^[12]。值得注意的是，微分算子$\lambda$的参数变为了PINN$f(t,x)$的参数。

4.1.1 例子（Naiver-Stokes方程）

我们的下一个例子涉及到不可压缩流体流动的现实场景，由无处不在的Navier-Stokes方程所描述。Naiver-Stokes方程描述了许多具有科学和工程意义的物理现象。它们可以用来模拟天气、洋流、管道中的水流和机翼周围的气流。完整和简化的Navier-Stokes方程有助于飞机和汽车的设计、血液流动的研究、电站的设计、污染物扩散的分析以及许多其他应用。让我们考虑二维的Navier-Stokes方程：

$u_t+\lambda_1(uu_x+vu_y)=-p_x+\lambda_2(u_{xx}+u_{yy})$ $v_t+\lambda_1(uv_x+uv_y)=-p_y+\lambda_2(v_{xx}+v_{yy}) \tag{15}$

其中，$u(t, x, y)$表示速度场的$x$分量，$v(t, x, y)$表示速度场$y$分量，$p(t, x, y)$表示压力。$\lambda=(\lambda_1, \lambda_2)$为未知参数。通过在无散度函数集合中搜索$Naiver-Stokes$方程的解，即：

$u_x+v_y=0\tag{16}$

这个额外的方程是描述流体质量守恒的不可压缩流体的连续性方程。我们假设

$u=\psi_y, v=-\psi_x$

$\psi(t, x, y)$为隐函数。在此假设下，能够自动满足连续方程$16$。给定速度场的噪声测量

$\lbrace t^i, x^i, y^i, u^i, v^i\rbrace^N_{i=1}$

我们希望学习到参数$\lambda$以及压力场$p(t, x, y)$。
定义$f(t, x, y)$和$g(t, x, y)$为：

$f:=u_t+\lambda_1(uu_x+vu_y)+p_x-\lambda_2(u_{xx}+u_{yy})$ $g:=v_t+\lambda_1(uv_x+vv_y)+p_y-\lambda_2(v_{xx}+v_{yy})$

然后使用有两个输出的神经网络近似$[\psi(t, x, y) p(t, x, y)]$。此先验假设以及公式$17$、$18$构成了PINN网络$[f(t, x, y) g(t, x, y)]$。$Navier-Stokes$算子的参数$\lambda$以及神经网络$[\psi(t, x, y), p(t, x, y)]$和$[f(t, x, y), g(t, x, y)]$的参数可以通过最小化均方误差函数$19$来训练得到。

$MSE:=\frac{1}{N} \sum_{i=1}^N(\mid u(t^i, x^i, y^i)-u^i \mid^2+\mid v(t^i, x^i, y^i)-v^i\mid^2) \\\\ +\frac{1}N \sum_{i=1}^N(\mid f(t^i, x^i, y^i)\mid^2+\mid g(t^i, x^i, y^i)\mid ^2) \tag{19}$

这里我们讨论不可压缩流体通过圆柱体的问题，aproblem known to exhibit rich dynamic behavior and transitions for different regimes of theReynolds number $Re=u_\infty D/v$。假设一个无量纲自由流体速度为$u_\infty=1$,圆柱直径$D=1$，运动黏度$v=0.01$，该系统呈现周期性稳态行为，其特征是圆柱wake中的不对称涡旋脱落模式，即$Karman vortex street$^[46]。
为生成针对该问题的高分辨率数据集，我们采用了光谱/hp元素求解器$NekTar$^[47]。具体来说，将解空间在空间域离散为包含412个三角形元素的曲面细分，在每个元素中，解都被近似为十阶分层半正交$Jacobi$多项式展开式的线性组合。我们假设左边界有均匀的自由流体速度剖面，在圆柱体下游25直径处的右边界施加零压力流出条件，并在上下边界$[-15, 25]\times[-8, 8]$的区域添加周期边界条件。我们是用三阶刚性稳定方法对公式$15$进行积分，直到该系统达到周期稳定状态，如图3(a)所示。在接下来的内容中，与这个稳态解相对应的数据集的一小部分将被用于模型训练，而剩余数据则用于模型预测。为简单起见，我们选择将采样限制在圆柱体下游的矩形区域，如图3(a)所示。

给定动沿流向$u(t, x, y)$和横向$v(t, x , y)$速度分量上的散射和势噪声数据，我们的目标是确定未知参数$\lambda_1$和$\lambda_2$，并定性地、准确地重建圆柱wake中的整个压力场$p(t, x, y)$，该场只能被识别为常数。为此，我们随机从高分辨率数据中进行降采样来得到训练数据。为了突出我们的方法在散射和稀疏数据中的学习能力，我们选取了$N=5,000$。仅相当于图3(b)所示的可用数据总数1%的数据。此外，我们还绘制了模型训练后预测速度分量$u(t, x, y)$和$v(t, x, y)$的代表性快照。这里使用的神经网络包含9层，每层20个神经元。
此例子结果的总结如图4所示。我们发现，即使训练数据中有非常多的噪声，PINN依旧能够非常准确地确定未知参数$\lambda_1$、$\lambda_2$。具体来说，对于无噪声的训练数据，估计误差$\lambda_1$和$\lambda_2$分别为$0.078\%$和$4.67\%$。当训练数据中包含$1\%$的不相关高斯噪声时，预测仍然是鲁棒的，两系数的预测误差分别为$0.17\%$和$5.70\%$。
一个更有趣的结果来自于该网络在没有任何关于压力本身的训练数据的情况下对整个压力场$p(t, x, y)$的高精度定性预测能力。与精确压力场的对比如图4所示。精确压力场和预测压力场之间的大小差异是由不可压缩Navier–Stokes系统的本质所决定的，因为压力场只有在一个常数以内才可被识别。通过利用基础物理从辅助测量值中推断出感兴趣的连续变量值，这是PINN所提供的更强能力的表现，展示了它们在求解高维度反问题中的潜力。
到目前为止，我们均假设数据点在整个时空域中可用。在许多实际情况下，人们可能只能在不同的时刻观察系统。在下一节中，我们将介绍一种不同的方法，该方法仅使用两个数据快照来解决数据驱动的发现问题。我们将看到，通过利用经典的Runge-Kutta时间递推方案，人们可以构建离散时间的PINN，即使数据快照之间的时间间隔非常大，其也可以保持较高的预测精度。

4.2 离散时间模型

对公式$1$应用$q$阶Runge-Kutta方法得到以下公式

$u^{n+c_i}=u^n-\Delta t \sum_{j=1}^qa_ij \mathcal{N}[u^{n+c_j}; \lambda], i=1,...,q,$ $u^{n+1}=u^{n}-\Delta t \sum_{j=1}^{q}b_j \mathcal{N}[u^{n+c_j};\lambda] \tag{20}$

其中，$u^{n+c_j}(x)=u(t^n+c_j\Delta t, x)$是系统在$t^n+c_j$($j=1,…,q$)时刻的隐藏状态。该通用形式包含了显式和隐式的时间递推格式，该格式取决于参数$\lbrace {a_ij, b_j, c_j}\rbrace$的选取。公式$20$可等价为：

$u^n=u_i^n, i=1....,q,\\ u^{n+1}=u_i^{n+1},i=1,...,q \tag{21}$

其中

$u_i^n:=u^{n+c_i}+\Delta t \sum_{j=1}^{q}a_{ij}\mathcal{N}[c^{n+c_j};\lambda],i=1,...,q,$ $u_i^{n+1}:=u^{n+c_i}+\Delta t \sum_{j=1}^{q}(a_ij-b_j) \mathcal{N}[u^{n+c_j};\lambda], i=1,...,q \tag{22}$

将多输出神经网络置于

$[u^{n+c_1}(x),...,u^{n+c_q}(x)] \tag{23}$

之前，
先验假设以及公式$22$形成了两个PINN：

$[u_1^n(x),...,u_q^n(x), u_{q+1}^n(x)] \tag{24}$

以及

$[u_1^{n+1}(x),...,u_q^{n+1}(x),u_{q+1}^{n+1}(x)] \tag{25}$

给定两个不同时刻$t^n$和$t^{n+1}$的时间切片$\lbrace x^n, u^n \rbrace$和$\lbrace x^{n+1}, u^{n+1} \rbrace$，神经网络$23$、$24$和$25$的共享参数以及微分算子的参数$\lambda$可以通过最小化平方误差$26$得到。

$SSE=SSE_n+SSE_n+1 \tag{26}$

其中

$SSE_n:=\sum_{j=1}^q \sum_{i=1}^{N_n} \mid u_j^n(x^{n, i})-u^{n, i}\mid^2$ $SSE_{n+1}:=\sum_{j=1}^n \sum_{i=1}^{N_{n+1}} \mid u_j^{n+1}(x^{n+1,i}-u^{n+1,i})\mid ^2$ $x^n=\lbrace x^{n,i} \rbrace _{i=1}^{N_n}, u^n=\lbrace u^{n, i} \rbrace _{i=1}^{N_n}$

4.2.1 例子

我们的最后一个例子旨在强调所提出的框架在处理涉及高阶导数的偏微分方程的能力。这里，我们考虑浅水表面上波浪的数学模型；$Korteweg–de Vries(KdV)$方程。这个方程也可以看作是带有dispersive项的Burgers方程。KdV方程与物理问题有几方面联系：它描述了长一维波在许多物理环境中的演化。这种物理环境包括具有弱非线性恢复力的浅水波、密度分层海洋中的长内波（long internal waves）、等离子体中的离子声波和晶格上的声波。此外，$KdV$方程是连续极限下$Fermi-Pasta-Ulam$问题^[48]中弦的控制方程。$KdV$公式如下:

$u_t+\lambda_1uu_x+\lambda_2u_{xxx} = 0 \tag{27}$

$(\lambda_1, \lambda_2)$为未知参数。对于KdV方程，公式$22$中的非线性算子由以下公式给定：

$\mathcal{N}[u^{n+c_j}]=\lambda_1 u^{n+c_j}u_x^{n+c_j}-\lambda_2u_{xxx}^{n+c_j}$

公式$23$、$24$、$25$的共享参数以及KdV方程的参数$\lambda=(\lambda_1, \lambda_2)$可以通过最小化平方误差公式$26$得到。
我们通过使用传统谱方法模拟公式$27$来得到训练数据和测试数据。具体来讲，初始边界条件为$u(0, x)=cos(\pi x)$以及周期性边界条件，使用$Chebfun$包以512个modes的谱傅里叶离散以及四阶显式$Runge-Kutta$时间积分器以离散时间步长$\Delta t=10^{-6}$对公式$27$积分到$t=1.0$。利用该数据集，我们提取$t^n=0.2$和$t^{n+1}=0.8$的两个解的切片，并使用$N_n=199$和$N_{n+1}=201$对其进行随机降采样来生成训练集。然后我们利用这些数据来训练离散时间PINN，loss函数平方误差为$26$，优化器为L-BFGS^[35]。网络架构包括4个隐藏层，每层有50个神经元，以及一个用于预测q阶Runge-Kutta解（即$u^{n+c_j}(x)$）的输出层（$j=1,…,q$）其中，q是生成及其精度阶时间误差累积的经验值，其由以下公式决定：

$q=0.5log\epsilon /log(\Delta t)$

该例子的时间步长为$\Delta t = 0.6$。
实验结果如图5所示。在顶部图例中，我们给出了精确解$u(t, x)$，以及用于训练的两个数据快照的位置。在中间的图例中给出了精确解和训练数据的更详细的概述。方程$27$的复杂非线性动力学是如何导致两个快照之间的解的形式存在显著差异的是需要我们注意的地方。尽管存在这些差异，并且两个训练快照之间存在较大的时间间隔，但是无论训练数据是否被噪声破坏，我们的方法都能够正确确定未知参数。具体来说，对于无噪声的训练数据，估计误差$\lambda_1$和$\lambda_2$分别为$0.023\%$和$0.006\%$，而训练数据中噪声为$1\%$的情况返回的误差分别为$0.057\%$和$0.017\%$。

参考文献

1.A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classifification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105. ↩
2.B.M. Lake, R. Salakhutdinov, J.B. Tenenbaum, Human-level concept learning through probabilistic program induction, Science 350 (2015) 1332–1338. ↩
3.B. Alipanahi, A. Delong, M.T. Weirauch, B.J. Frey, Predicting the sequence specifificities of DNA- and RNA-binding proteins by deep learning, Nat. Biotechnol. 33 (2015) 831–838. ↩
4.M. Raissi, P. Perdikaris, G.E. Karniadakis, Inferring solutions of differential equations using noisy multi-fifidelity data, J. Comput. Phys. 335 (2017) 736–746. ↩
5.M. Raissi, P. Perdikaris, G.E. Karniadakis, Machine learning of linear differential equations using Gaussian processes, J. Comput. Phys. 348 (2017) 683–693. ↩
6.H. Owhadi, Bayesian numerical homogenization, Multiscale Model. Simul. 13 (2015) 812–828. ↩
7.C.E. Rasmussen, C.K. Williams, Gaussian Processes for Machine Learning, vol. 1, MIT Press, Cambridge, 2006. ↩
8.M. Raissi, P. Perdikaris, G.E. Karniadakis, Numerical Gaussian processes for time-dependent and non-linear partial differential equations, 2017, arXiv: 1703.10230. ↩
9.M. Raissi, G.E. Karniadakis, Hidden physics models: machine learning of nonlinear partial differential equations, 2017, arXiv:1708.00588. ↩
10.H. Owhadi, C. Scovel, T. Sullivan, et al., Brittleness of Bayesian inference under fifinite information in a continuous world, Electron. J. Stat. 9 (2015) 1–79. ↩
11.K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Netw. 2 (1989) 359–366. ↩
12.A.G. Baydin, B.A. Pearlmutter, A.A. Radul, J.M. Siskind, Automatic differentiation in machine learning: a survey, 2015, arXiv:1502.05767. ↩
13.C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and fifinite difference solutions of the Burgers equation, Comput. Fluids 14 (1986) 23–41. ↩
14.S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations, Sci. Adv. 3 (2017). ↩
15.I.E. Lagaris, A. Likas, D.I. Fotiadis, Artifificial neural networks for solving ordinary and partial differential equations, IEEE Trans. Neural Netw. 9 (1998) 987–1000. ↩
16.D.C. Psichogios, L.H. Ungar, A hybrid neural network-fifirst principles approach to process modeling, AIChE J. 38 (1992) 1499–1511. ↩
17.J.-X. Wang, J. Wu, J. Ling, G. Iaccarino, H. Xiao, A comprehensive physics-informed machine learning framework for predictive turbulence modeling, 2017, arXiv:1701.07102. ↩
18.Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantifification, 2018, arXiv:1801. 06879. ↩
19.T. Hagge, P. Stinis, E. Yeung, A.M. Tartakovsky, Solving differential equations with unknown constitutive relations as recurrent neural networks, 2017, arXiv:1710.02242. ↩
20.R. Tripathy, I. Bilionis, Deep UQ: learning deep neural network surrogate models for high dimensional uncertainty quantifification, 2018, arXiv:1802. 00850. ↩
21.P.R. Vlachas, W. Byeon, Z.Y. Wan, T.P. Sapsis, P. Koumoutsakos, Data-driven forecasting of high-dimensional chaotic systems with long-short term memory networks, 2018, arXiv:1802.07486. ↩
22.E.J. Parish, K. Duraisamy, A paradigm for data-driven predictive modeling using field inversion and machine learning, J. Comput. Phys. 305 (2016) 758–774. ↩
23.K. Duraisamy, Z.J. Zhang, A.P. Singh, New approaches in turbulence and transition modeling using data-driven techniques, in: 53rd AIAA Aerospace Sciences Meeting, 2018, p. 1284. ↩
24.J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modelling using deep neural networks with embedded invariance, J. Fluid Mech. 807 (2016) 155–166. ↩
25.Z.J. Zhang, K. Duraisamy, Machine learning methods for data-driven turbulence modeling, in: 22nd AIAA Computational Fluid Dynamics Conference, 2015, p. 2460. ↩
26.M. Milano, P. Koumoutsakos, Neural network modeling for near wall turbulent flow, J. Comput. Phys. 182 (2002) 1–26. ↩
27.P. Perdikaris, D. Venturi, G.E. Karniadakis, Multififidelity information fusion algorithms for high-dimensional systems and massive data sets, SIAM J. Sci. Comput. 38 (2016) B521–B538. ↩
28.R. Rico-Martinez, J. Anderson, I. Kevrekidis, Continuous-time nonlinear signal processing: a neural network based approach for gray box identification, in: Neural Networks for Signal Processing IV. Proceedings of the 1994 IEEE Workshop, IEEE, 1994, pp. 596–605. ↩
29.J. Ling, J. Templeton, Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty, Phys. Fluids 27 (2015) 085103. ↩
30.H.W. Lin, M. Tegmark, D. Rolnick, Why does deep and cheap learning work so well? J. Stat. Phys. 168 (2017) 1223–1247. ↩
31.R. Kondor, N-body networks: a covariant hierarchical neural network architecture for learning atomic potentials, 2018, arXiv:1803.01588. ↩
32.R. Kondor, S. Trivedi, On the generalization of equivariance and convolution in neural networks to the action of compact groups, 2018, arXiv:1802. 03690. ↩
33.M. Hirn, S. Mallat, N. Poilvert, Wavelet scattering regression of quantum chemical energies, Multiscale Model. Simul. 15 (2017) 827–863. ↩
34.S. Mallat, Understanding deep convolutional networks, Philos. Trans. R. Soc. A 374 (2016) 20150203. ↩
35.D.C. Liu, J. Nocedal, On the limited memory BFGS method for large scale optimization, Math. Program. 45 (1989) 503–528. ↩
36.I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016. ↩
37.D. Kingma, J. Ba, Adam: a method for stochastic optimization, 2014, arXiv:1412.6980. ↩
38.A. Choromanska, M. Henaff, M. Mathieu, G.B. Arous, Y. LeCun, The loss surfaces of multilayer networks, in: Artifificial Intelligence and Statistics, pp. 192–204. ↩
39.R. Shwartz-Ziv, N. Tishby, Opening the black box of deep neural networks via information, 2017, arXiv:1703.00810. ↩
40.T.A. Driscoll, N. Hale, L.N. Trefethen, Chebfun Guide, 2014. ↩
41.M. Stein, Large sample properties of simulations using Latin hypercube sampling, Technometrics 29 (1987) 143–151. ↩
42.J. Snoek, H. Larochelle, R.P. Adams, Practical bayesian optimization of machine learning algorithms, in: Advances in Neural Information Processing Systems, 2012, pp. 2951–2959. ↩
43.H.-J. Bungartz, M. Griebel, Sparse grids, Acta Numer. 13 (2004) 147–269. ↩
44.I.H. Sloan, H. Wo´zniakowski, When are quasi-Monte Carlo algorithms effiffifficient for high dimensional integrals? J. Complex. 14 (1998) 1–33. ↩
45.A. Iserles, A First Course in the Numerical Analysis of Differential Equations, vol. 44, Cambridge University Press, 2009. ↩
46.T. Von Kármán, Aerodynamics, vol. 9, McGraw-Hill, New York, 1963. ↩
47.G. Karniadakis, S. Sherwin, Spectral/hp Element Methods for Computational Fluid Dynamics, Oxford University Press, 2013. ↩
48.T. Dauxois, Fermi, Pasta, Ulam and a mysterious lady, 2008, arXiv:0801.1590. ↩
49.M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, et al., Tensorflow: large-scale machine learning on heterogeneous distributed systems, 2016, arXiv:1603.04467. ↩
50.S.L. Brunton, J.L. Proctor, J.N. Kutz, Discovering governing equations from data by sparse identifification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. 113 (2016) 3932–3937. ↩

GeophyAI

Physics-informed neural networks:Translation Part II

Data-driven discovery of partial differential equations

4.1 连续时间模型

4.1.1 例子（Naiver-Stokes方程）

4.2 离散时间模型

4.2.1 例子