TY - GEN
T1 - A Data-Driven Choice of Misfit Function for FWI Using Reinforcement Learning
AU - Sun, Bingbing
AU - Alkhalifah, Tariq Ali
N1 - KAUST Repository Item: Exported on 2021-03-30
PY - 2020
Y1 - 2020
N2 - In the workflow of Full-Waveform Inversion (FWI), we often tune the parameters of the inversion to help us avoid cycle skipping and obtain high resolution models. For example, typically start by using objective functions that avoid cycle skipping, and then later, we utilize the least squares misfit to admit high resolution information. Such hierarchical approaches are common in FWI, and they often depend
on our manual intervention based on many factors, and of course, results depend on experience. However, with the large data size often involved in the inversion and the complexity of the process, making optimal choices is difficult even for an experienced practitioner. Thus, as an example, and within the framework of reinforcement learning, we utilize a deep-Q network (DQN) to learn an optimal policy
to determine the proper timing to switch between different misfit functions. Specifically, we train the state-action value function (Q) to predict when to use the conventional L2-norm misfit function or the more advanced optimal-transport matching-filter (OTMF) misfit to mitigate the cycle-skipping and obtain high resolution, as well as improve convergence. We use a simple while demonstrative shiftedsignal inversion examples to demonstrate the basic principles of the proposed method.
AB - In the workflow of Full-Waveform Inversion (FWI), we often tune the parameters of the inversion to help us avoid cycle skipping and obtain high resolution models. For example, typically start by using objective functions that avoid cycle skipping, and then later, we utilize the least squares misfit to admit high resolution information. Such hierarchical approaches are common in FWI, and they often depend
on our manual intervention based on many factors, and of course, results depend on experience. However, with the large data size often involved in the inversion and the complexity of the process, making optimal choices is difficult even for an experienced practitioner. Thus, as an example, and within the framework of reinforcement learning, we utilize a deep-Q network (DQN) to learn an optimal policy
to determine the proper timing to switch between different misfit functions. Specifically, we train the state-action value function (Q) to predict when to use the conventional L2-norm misfit function or the more advanced optimal-transport matching-filter (OTMF) misfit to mitigate the cycle-skipping and obtain high resolution, as well as improve convergence. We use a simple while demonstrative shiftedsignal inversion examples to demonstrate the basic principles of the proposed method.
UR - http://hdl.handle.net/10754/661750
UR - https://www.earthdoc.org/content/papers/10.3997/2214-4609.202010203
U2 - 10.3997/2214-4609.202010203
DO - 10.3997/2214-4609.202010203
M3 - Conference contribution
BT - EAGE 2020 Annual Conference & Exhibition Online
PB - European Association of Geoscientists & Engineers
ER -