SPEC 7 — 伪随机数生成器的种子设置

作者:
Stéfan van der Walt <stefanv@berkeley.edu>, Sebastian Berg <sebastianb@nvidia.com>, Pamphile Roy <roy.pamphile@gmail.com>, Matt Haberland <mhaberla@calpoly.edu>
讨论:
https://github.com/scipy/scipy/issues/14322
历史:
https://github.com/scientific-python/specs/commits/main/spec-0007
获得认可:
ipython, scikit-image, scipy

描述#

目前,生态系统中的各个库提供了各种用于种子设置伪随机数生成的 API。本 SPEC 建议一个统一的、实用的 API,并考虑了技术和历史因素。采用这种统一的 API 将简化用户体验,特别是对于依赖多个项目的用户。

我们建议

  • 标准化 rng 关键字的使用和解释,用于种子设置,以及
  • 避免使用全局状态和旧版比特流生成器。

我们建议通过以下方式实现这些原则

  • 弃用现有的种子参数(通常为 random_stateseed)的使用,转而使用一致的 rng 参数,
  • 使用 numpy.random.default_rng 规范化 rng 参数并实例化一个 Generator1,以及
  • 弃用使用 numpy.random.seed 来控制随机状态。

我们主要关注 API 的统一性,但也鼓励库转向使用 NumPy 伪随机 Generator,因为

  1. Generator 通过其 SeedSequence 机制避免了与简单种子设置(例如,使用连续整数)相关的问题;
  2. 其使用避免了依赖全局状态,这可能会使代码执行更难以跟踪,并可能在并行处理场景中造成问题。

范围#

这旨在作为对所有允许用户控制 NumPy 随机数生成器状态的库的建议。它特别针对当前通过除 rng 之外的参数接受 RandomState 实例或允许 numpy.random.seed 控制随机状态的函数,但这些想法更广泛地适用。rng 关键字也可以适应 NumPy 以外提供的其他随机数生成器,但这超出了本 SPEC 的范围。

概念#

  • BitGenerator:生成伪随机比特流。NumPy 中的默认生成器(numpy.random.default_rng)使用 PCG64。
  • Generator:从 BitGenerator 生成的比特派生伪随机数。
  • RandomStateNumPy 中的旧版对象,类似于 Generator,它基于梅森旋转算法生成随机数。

约束#

NumPy、SciPy、scikit-learn、scikit-image 和 NetworkX 都以略微不同的方式实现了伪随机种子设置。常见的关键字参数包括 random_stateseed。在实践中,种子也经常可以使用 numpy.random.seed 进行控制。

核心项目认可#

对本 SPEC 的认可意味着一个项目认为 rng 关键字的标准化和解释以及避免使用全局状态和旧版比特流生成器是值得广泛实施的好主意。

生态系统采用#

要采用本 SPEC,项目应

  • 弃用 random_state/seed 参数,转而使用 rng 参数,在所有需要用户控制伪随机数生成的函数中,
  • 使用 numpy.random.default_rng 规范化 rng 参数并实例化一个 Generator,以及
  • 弃用使用 numpy.random.seed 来控制随机状态。

徽章#

项目可以通过包含 SPEC 徽章来突出显示其对本 SPEC 的采用。

SPEC 7 — Seeding pseudo-random number generation
[![SPEC 7 — Seeding pseudo-random number generation](https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038)](https://scientific-python.cn/specs/spec-0007/)
|SPEC 7 — Seeding pseudo-random number generation| 

.. |SPEC 7 — Seeding pseudo-random number generation| image:: https://img.shields.io/badge/SPEC-7-green?labelColor=%23004811&color=%235CA038
   :target: https://scientific-python.cn/specs/spec-0007/
要使用一个徽章指示对多个 SPEC 的采用,请参阅 此处

实现#

scikit-learn(sklearn.utils.check_random_state)等软件包中的旧版行为通常处理 None(使用全局种子状态)、整数(转换为 RandomState)或 RandomState 对象。

我们在此处的建议是一种弃用策略,并非在所有情况下都遵循 Hinsen 原则2,尽管它可以通过强制使用 rng 作为关键字参数来非常接近地做到这一点。

弃用策略如下所示。弃用策略

最初,接受 rng 和现有的 random_state/seed/... 关键字参数。

  • 如果用户同时指定了这两个参数,则引发错误。
  • 如果通过关键字传递 rng,则使用 np.random.default_rng() 对其进行规范化,并根据需要使用它来生成随机数。
  • 如果指定了 random_state/seed/...(通过关键字或位置,如果允许),则保留现有行为。

rng 在 SPEC 0 建议的支持窗口内所有版本中可用后,发出以下警告

  • 如果既未指定 rng 也未指定 random_state/seed/... 并且已使用 np.random.seed 设置种子,则发出有关即将发生的运行时行为更改的 FutureWarning

  • 如果通过关键字或位置传递 random_state/seed/...,则像以前一样处理它,但

    • 如果通过关键字传递,则发出 DeprecationWarning,警告 random_state 关键字将被弃用,转而使用 rng
    • 如果通过位置传递,则发出 FutureWarning,警告位置参数的运行时行为将发生变化。

弃用期过后,仅接受 rng,如果提供了 random_state/seed/...,则引发错误。

到那时,带有类型注释的函数签名可能如下所示

from collections.abc import Sequence
import numpy as np


SeedLike = int | np.integer | Sequence[int] | np.random.SeedSequence
RNGLike = np.random.Generator | np.random.BitGenerator


def my_func(*, rng: RNGLike | SeedLike | None = None):
    """My function summary.

    Parameters
    ----------
    rng : `numpy.random.Generator`, optional
        Pseudorandom number generator state. When `rng` is None, a new
        `numpy.random.Generator` is created using entropy from the
        operating system. Types other than `numpy.random.Generator` are
        passed to `numpy.random.default_rng` to instantiate a `Generator`.
    """
    rng = np.random.default_rng(rng)

    ...

另请注意 rng 参数文档字符串的建议语言,它鼓励用户传递 GeneratorNone,但允许 numpy.random.default_rng 接受的其他类型(由类型注释捕获)。

影响#

有三类用户,其受影响程度不同。

  1. 那些不尝试控制随机状态的用户。他们的代码将从使用未设置种子的全局 RandomState 切换到使用未设置种子的 Generator。由于伪随机数的底层分布不会改变,因此这些用户应该基本不受影响。虽然从技术上讲此更改不符合 Hinsen 原则,但其影响应该最小。

  2. random_state/seed 参数的用户。对这些参数的支持最终将被删除,但在弃用期间,我们可以通过警告和文档提供明确的指导,说明如何迁移到新的 rng 关键字。

  3. 使用 numpy.random.seed 的用户。该提案将取消该全局种子设置机制,这意味着在弃用期过后,依赖它的代码将从设置种子变为未设置种子。为了确保这一点不会被忽视,允许通过 numpy.random.seed 控制随机状态的库应该在调用 np.random.seed 时引发 FutureWarning。(有关示例,请参阅下面的 代码。)为了完全遵循 Hinsen 原则,这些警告应改为作为错误引发。作为回应,用户将不得不从使用 numpy.random.seed 切换到显式地将 rng 参数传递给所有接受它的函数。

代码#

例如,考虑 SciPy 函数如何使用装饰器从 random_state 参数转换为 rng 参数。

import numpy as np
import functools
import warnings


def _transition_to_rng(old_name, *, position_num=None, end_version=None):
    """Example decorator to transition from old PRNG usage to new `rng` behavior

    Suppose the decorator is applied to a function that used to accept parameter
    `old_name='random_state'` either by keyword or as a positional argument at
    `position_num=1`. At the time of application, the name of the argument in the
    function signature is manually changed to the new name, `rng`. If positional
    use was allowed before, this is not changed.*

    - If the function is called with both `random_state` and `rng`, the decorator
      raises an error.
    - If `random_state` is provided as a keyword argument, the decorator passes
      `random_state` to the function's `rng` argument as a keyword. If `end_version`
      is specified, the decorator will emit a `DeprecationWarning` about the
      deprecation of keyword `random_state`.
    - If `random_state` is provided as a positional argument, the decorator passes
      `random_state` to the function's `rng` argument by position. If `end_version`
      is specified, the decorator will emit a `FutureWarning` about the changing
      interpretation of the argument.
    - If `rng` is provided as a keyword argument, the decorator validates `rng` using
      `numpy.random.default_rng` before passing it to the function.
    - If `end_version` is specified and neither `random_state` nor `rng` is provided
      by the user, the decorator checks whether `np.random.seed` has been used to set
      the global seed. If so, it emits a `FutureWarning`, noting that usage of
      `numpy.random.seed` will eventually have no effect. Either way, the decorator
      calls the function without explicitly passing the `rng` argument.

    If `end_version` is specified, a user must pass `rng` as a keyword to avoid warnings.

    After the deprecation period, the decorator can be removed, and the function
    can simply validate the `rng` argument by calling `np.random.default_rng(rng)`.

    * A `FutureWarning` is emitted when the PRNG argument is used by
      position. It indicates that the "Hinsen principle" (same
      code yielding different results in two versions of the software)
      will be violated, unless positional use is deprecated. Specifically:

      - If `None` is passed by position and `np.random.seed` has been used,
        the function will change from being seeded to being unseeded.
      - If an integer is passed by position, the random stream will change.
      - If `np.random` or an instance of `RandomState` is passed by position,
        an error will be raised.

      We suggest that projects consider deprecating positional use of
      `random_state`/`rng` (i.e., change their function signatures to
      ``def my_func(..., *, rng=None)``); that might not make sense
      for all projects, so this SPEC does not make that
      recommendation, neither does this decorator enforce it.

    Parameters
    ----------
    old_name : str
        The old name of the PRNG argument (e.g. `seed` or `random_state`).
    position_num : int, optional
        The (0-indexed) position of the old PRNG argument (if accepted by position).
        Maintainers are welcome to eliminate this argument and use, for example,
        `inspect`, if preferred.
    end_version : str, optional
        The full version number of the library when the behavior described in
        `DeprecationWarning`s and `FutureWarning`s will take effect. If left
        unspecified, no warnings will be emitted by the decorator.

    """
    NEW_NAME = "rng"

    cmn_msg = (
        "To silence this warning and ensure consistent behavior in SciPy "
        f"{end_version}, control the RNG using argument `{NEW_NAME}`. Arguments passed "
        f"to keyword `{NEW_NAME}` will be validated by `np.random.default_rng`, so the "
        "behavior corresponding with a given value may change compared to use of "
        f"`{old_name}`. For example, "
        "1) `None` will result in unpredictable random numbers, "
        "2) an integer will result in a different stream of random numbers, (with the "
        "same distribution), and "
        "3) `np.random` or `RandomState` instances will result in an error. "
        "See the documentation of `default_rng` for more information."
    )

    def decorator(fun):
        @functools.wraps(fun)
        def wrapper(*args, **kwargs):
            # Determine how PRNG was passed
            as_old_kwarg = old_name in kwargs
            as_new_kwarg = NEW_NAME in kwargs
            as_pos_arg = position_num is not None and len(args) >= position_num + 1
            emit_warning = end_version is not None

            # Can only specify PRNG one of the three ways
            if int(as_old_kwarg) + int(as_new_kwarg) + int(as_pos_arg) > 1:
                message = (
                    f"{fun.__name__}() got multiple values for "
                    f"argument now known as `{NEW_NAME}`"
                )
                raise TypeError(message)

            # Check whether global random state has been set
            global_seed_set = np.random.mtrand._rand._bit_generator._seed_seq is None

            if as_old_kwarg:  # warn about deprecated use of old kwarg
                kwargs[NEW_NAME] = kwargs.pop(old_name)
                if emit_warning:
                    message = (
                        f"Use of keyword argument `{old_name}` is "
                        f"deprecated and replaced by `{NEW_NAME}`.  "
                        f"Support for `{old_name}` will be removed "
                        f"in SciPy {end_version}."
                    ) + cmn_msg
                    warnings.warn(message, DeprecationWarning, stacklevel=2)

            elif as_pos_arg:
                # Warn about changing meaning of positional arg

                # Note that this decorator does not deprecate positional use of the
                # argument; it only warns that the behavior will change in the future.
                # Simultaneously transitioning to keyword-only use is another option.

                arg = args[position_num]
                # If the argument is None and the global seed wasn't set, or if the
                # argument is one of a few new classes, the user will not notice change
                # in behavior.
                ok_classes = (
                    np.random.Generator,
                    np.random.SeedSequence,
                    np.random.BitGenerator,
                )
                if (arg is None and not global_seed_set) or isinstance(arg, ok_classes):
                    pass
                elif emit_warning:
                    message = (
                        f"Positional use of `{NEW_NAME}` (formerly known as "
                        f"`{old_name}`) is still allowed, but the behavior is "
                        "changing: the argument will be normalized using "
                        f"`np.random.default_rng` beginning in SciPy {end_version}, "
                        "and the resulting `Generator` will be used to generate "
                        "random numbers."
                    ) + cmn_msg
                    warnings.warn(message, FutureWarning, stacklevel=2)

            elif as_new_kwarg:  # no warnings; this is the preferred use
                # After the removal of the decorator, normalization with
                # np.random.default_rng will be done inside the decorated function
                kwargs[NEW_NAME] = np.random.default_rng(kwargs[NEW_NAME])

            elif global_seed_set and emit_warning:
                # Emit FutureWarning if `np.random.seed` was used and no PRNG was passed
                message = (
                    "The NumPy global RNG was seeded by calling "
                    f"`np.random.seed`. Beginning in {end_version}, this "
                    "function will no longer use the global RNG."
                ) + cmn_msg
                warnings.warn(message, FutureWarning, stacklevel=2)

            return fun(*args, **kwargs)

        return wrapper

    return decorator


# Example usage of _prepare_rng decorator.

# Suppose a library uses a custom random state normalisation function, such as
from scipy._lib._util import check_random_state

# https://github.com/scipy/scipy/blob/94532e74b902b569bfad504866cb53720c5f4f31/scipy/_lib/_util.py#L253


# Suppose a function `library_function` is defined as:
def library_function(arg1, random_state=None, arg2=0):
    random_state = check_random_state(random_state)
    return random_state.random() * arg1 + arg2


# We apply the decorator and change the function signature at the same time.
# The use of `random_state` throughout the function may be replaced with `rng`,
# or the variable may be defined as `random_state = rng`.
@_transition_to_rng("random_state", position_num=1)
def library_function(arg1, rng=None, arg2=0):
    rng = check_random_state(rng)
    return rng.random() * arg1 + arg2


# After `rng` is available in all releases within the support window suggested by
# SPEC 0, we pass the `end_version` param to the decorator to emit warnings.
@_transition_to_rng("random_state", position_num=1, end_version="1.17.0")
def library_function(arg1, rng=None, arg2=0):
    rng = check_random_state(rng)
    return rng.random() * arg1 + arg2


# At the end of the deprecation period, remove the decorator, and normalize
# `rng` with` np.random.default_rng`.
def library_function(arg1, rng=None, arg2=0):
    rng = np.random.default_rng(rng)
    return rng.random() * arg1 + arg2

注释#


  1. 请注意,numpy.random.default_rng 不接受 RandomState 的实例,因此使用 RandomState 来控制种子实际上也被弃用了。也就是说,np.random.seednp.random.RandomState 本身并未被弃用,因此它们仍可能在某些上下文中使用(例如,开发人员用于生成单元测试数据)。 ↩︎

  2. Hinsen 原则大致是指,无论现在还是将来执行代码,都应返回相同的结果或引发错误。 ↩︎

本页内容