重构数据存储管理器

每个数据存储管理器中存在大量的样板代码和/或复制粘贴代码。随着更多管理器的添加,维护所有这些代码所需的时间将持续增长。为了缓解这种情况以及由此产生的错误,需要创建一个基础管理器类,所有通用代码都将驻留在此类中。

Launchpad 蓝图:https://blueprints.launchpad.net/trove/+spec/datastore-manager-refactor

问题描述

当在一个数据存储的管理器代码中发现并修复错误时,这些更改很少会转移到所有其他数据存储。这导致每个实验数据存储的稳定性“漂移”,因为目前没有第三方 CI。此外,当实现“新的改进”的解决问题的方法时,它们在每个管理器中以不同的方式完成,如果完成的话。也很难将功能(和编码知识)从一个数据存储转移到另一个数据存储,并且当实现新的数据存储管理器时,问题因代码的复制和粘贴而加剧。

当前问题的例子是实例从 BUILD 变为 ACTIVE 然后变回 BUILD 1,或者从 BUILD->SHUTDOWN->ACTIVE。 2 修复这个问题意味着更改 prepare 的工作方式,而目前需要在每个数据存储管理器中进行此更改。在当前架构下,以统一的方式实现这一点几乎是不可能的。

提议的变更

将创建一个新的“Manager”类,它将作为所有数据存储管理器的基础类。为了保持范围合理,初始实现中只会将最少的功能拉回到基础类中——这将包括当前为“样板”代码的方法(例如对 rpc_ping 和 get_fileststem_stats 调用以及周期性任务 update_status 的响应)。

还需要一种封装功能的机制,以便基础管理器能够执行特定于数据存储的指令。这将通过使用可以被每个子类覆盖的属性来实现。一些必需的属性(例如“status”对象)将被声明为抽象,必须存在;其他属性(例如新的配置管理器,以及潜在的未来策略字典)将是可选的。

新的 Manager 类将驻留在数据存储模块中

trove/guestagent/datastore/manager.py

这与现有的 service.py 文件一起(其中包含现有的 BaseDbStatus 类)。MySQL 派生类的目录结构也将进行一些整理,最终如下所示

trove
+-- guestagent
    +-- datastore
        +-- __init__.py
        +-- manager.py          <-- new 'base' manager
        +-- service.py
        +-- experimental
         +-- __init__.py
         +-- cassandra
          +-- __init__.py
          +-- manager.py
          +-- service.py
          +-- system.py

        <other experimental datastore modules>

         +-- redis
          +-- __init__.py
          +-- manager.py
          +-- service.py
          +-- system.py
         +-- vertica
             +-- __init__.py
             +-- manager.py
             +-- service.py
             +-- system.py
        +-- mysql_common        <-- new module
         +-- __init__.py
         +-- manager.py      <-- renamed from mysql/manager_base.py
         +-- service.py      <-- renamed from mysql/service_base.py
        +-- mysql
         +-- __init__.py
         +-- manager.py
         +-- service.py
        +-- technical-preview
            +-- __init__.py

在本次重构的背景下(作为概念验证),将正确解决实例在 BUILD 状态下进出循环的问题。prepare 方法将被移动到基础类中,该类将无缝实现确保在 prepare 成功完成后才发送通知所需的代码。现有的 prepare 方法将被重命名为“do_prepare”,并将从基础 prepare 方法中调用。

通过让管理器在 prepare 操作开始和成功完成时写入文件,将实现确定 prepare 是否完成的方法。任何错误(异常)都将被捕获和记录,并且实例将被设置为 FAILED 状态。

以下是基础管理器中可能呈现的示例,尽管可能会根据需要定义其他属性(并且可以在未来的清理工作中添加其他属性)

class Manager(periodic_task.PeriodicTasks):
    """This is the base class for all datastore managers.  Over time, common
    functionality should be pulled back here from the existing managers.
    """

    def __init__(self, manager_name):

        super(Manager, self).__init__(CONF)

        # Manager properties
        self.__manager_name = manager_name
        self.__manager = None
        self.__prepare_error = False

    @property
    def manager_name(self):
        """This returns the passed-in name of the manager."""
        return self.__manager_name

    @property
    def manager(self):
        """This returns the name of the manager."""
        if not self.__manager:
            self.__manager = CONF.datastore_manager or self.__manager_name
        return self.__manager

    @property
    def prepare_error(self):
        return self.__prepare_error

    @prepare_error.setter
    def prepare_error(self, prepare_error):
        self.__prepare_error = prepare_error

    @property
    def configuration_manager(self):
        """If the datastore supports the new-style configuration manager,
        it should override this to return it.
        """
        return None

    @abc.abstractproperty
    def status(self):
        """This should return an instance of a status class that has been
        inherited from datastore.service.BaseDbStatus.  Each datastore
        must implement this property.
        """
        return None

    ################
    # Status related
    ################
    @periodic_task.periodic_task
    def update_status(self, context):
        """Update the status of the trove instance. It is decorated with
        perodic_task so it is called automatically.
        """
        LOG.debug("Update status called.")
        self.status.update()

    def rpc_ping(self, context):
        LOG.debug("Responding to RPC ping.")
        return True

    #################
    # Prepare related
    #################
    def prepare(self, context, packages, databases, memory_mb, users,
                device_path=None, mount_point=None, backup_info=None,
                config_contents=None, root_password=None, overrides=None,
                cluster_config=None, snapshot=None):
        """Set up datastore on a Guest Instance."""
        LOG.info(_("Starting datastore prepare for '%s'.") % self.manager)
        self.status.begin_install()
        post_processing = True if cluster_config else False
        try:
            self.do_prepare(context, packages, databases, memory_mb,
                            users, device_path, mount_point, backup_info,
                            config_contents, root_password, overrides,
                            cluster_config, snapshot)
        except Exception as ex:
            self.prepare_error = True
            LOG.exception(_("An error occurred preparing datastore: %s") %
                          ex.message)
            raise
        finally:
            LOG.info(_("Ending datastore prepare for '%s'.") % self.manager)
            self.status.end_install(error_occurred=self.prepare_error,
                                    post_processing=post_processing)
        # At this point critical 'prepare' work is done and the instance
        # is now in the correct 'ACTIVE' 'INSTANCE_READY' or 'ERROR' state.
        # Of cource if an error has occurred, none of the code that follows
        # will run.
        LOG.info(_('Completed setup of datastore successfully.'))

        # We only create databases and users automatically for non-cluster
        # instances.
        if not cluster_config:
            try:
                if databases:
                    LOG.debug('Calling add databases.')
                    self.create_database(context, databases)
                if users:
                    LOG.debug('Calling add users.')
                    self.create_user(context, users)
            except Exception as ex:
                LOG.exception(_("An error occurred creating databases/users: "
                                "%s") % ex.message)
                raise

        try:
            LOG.debug('Calling post_prepare.')
            self.post_prepare(context, packages, databases, memory_mb,
                              users, device_path, mount_point, backup_info,
                              config_contents, root_password, overrides,
                              cluster_config, snapshot)
        except Exception as ex:
            LOG.exception(_("An error occurred in post prepare: %s") %
                          ex.message)
            raise

    @abc.abstractmethod
    def do_prepare(self, context, packages, databases, memory_mb, users,
                   device_path, mount_point, backup_info, config_contents,
                   root_password, overrides, cluster_config, snapshot):
        """This is called from prepare when the Trove instance first comes
        online.  'Prepare' is the first rpc message passed from the
        task manager.  do_prepare handles all the base configuration of
        the instance and is where the actual work is done.  Once this method
        completes, the datastore is considered either 'ready' for use (or
        for final connections to other datastores) or in an 'error' state,
        and the status is changed accordingly.  Each datastore must
        implement this method.
        """
        pass

    def post_prepare(self, context, packages, databases, memory_mb, users,
                     device_path, mount_point, backup_info, config_contents,
                     root_password, overrides, cluster_config, snapshot):
        """This is called after prepare has completed successfully.
        Processing done here should be limited to things that will not
        affect the actual 'running' status of the datastore (for example,
        creating databases and users, although these are now handled
        automatically).  Any exceptions are caught, logged and rethrown,
        however no status changes are made and the end-user will not be
        informed of the error.
        """
        LOG.debug('No post_prepare work has been defined.')
        pass

    #####################
    # File System related
    #####################
    def get_filesystem_stats(self, context, fs_path):
        """Gets the filesystem stats for the path given."""
        # TODO(peterstac) - note that fs_path is not used in this method.
        mount_point = CONF.get(self.manager).mount_point
        LOG.debug("Getting file system stats for '%s'" % mount_point)
        return dbaas.get_filesystem_volume_stats(mount_point)

    #################
    # Cluster related
    #################
    def cluster_complete(self, context):
        LOG.debug("Cluster creation complete, starting status checks.")
        self.status.end_install()

基础服务类将得到增强,包含设置一个标志所需的必要方法,该标志表示 prepare 是否完成。它看起来如下所示(仅显示新代码)

class BaseDbStatus(object):

    GUESTAGENT_DIR = '~'
    PREPARE_START_FILENAME = '.guestagent.prepare.start'
    PREPARE_END_FILENAME = '.guestagent.prepare.end'

    def __init__(self):
        self._prepare_completed = None

    @property
    def prepare_completed(self):
        if self._prepare_completed is None:
            # Force the file check
            self.prepare_completed = None
        return self._prepare_completed

    @prepare_completed.setter
    def prepare_completed(self, value):
        # Set the value based on the existence of the file; 'value' is
        # ignored
        # This is required as the value of prepare_completed is cached,
        # so this must be referenced any time the existence of the
        # file changes
        self._prepare_completed = os.path.isfile(
            guestagent_utils.build_file_path(
                self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME))

    def begin_install(self):
        """Called right before DB is prepared."""
        prepare_start_file = guestagent_utils.build_file_path(
            self.GUESTAGENT_DIR, self.PREPARE_START_FILENAME)
        operating_system.write_file(prepare_start_file, '')
        self.prepare_completed = False

        self.set_status(instance.ServiceStatuses.BUILDING, True)

    def end_install(self, error_occurred=False, post_processing=False):
        """Called after prepare completes."""

        # Set the "we're done" flag if there's no error and
        # no post_processing is necessary
        if not (error_occurred or post_processing):
            prepare_end_file = guestagent_utils.build_file_path(
                self.GUESTAGENT_DIR, self.PREPARE_END_FILENAME)
            operating_system.write_file(prepare_end_file, '')
            self.prepare_completed = True

        final_status = None
        if error_occurred:
            final_status = instance.ServiceStatuses.FAILED
        elif post_processing:
            final_status = instance.ServiceStatuses.INSTANCE_READY

        if final_status:
            LOG.info(_("Set final status to %s.") % final_status)
            self.set_status(final_status, force=True)
        else:
            self._end_install_or_restart(True)

    def end_restart(self):
        self.restart_mode = False
        LOG.info(_("Ending restart."))
        self._end_install_or_restart(False)

    def _end_install_or_restart(self, force):
        """Called after DB is installed or restarted.
        Updates the database with the actual DB server status.
        """
        real_status = self._get_actual_db_status()
        LOG.info(_("Current database status is '%s'.") % real_status)
        self.set_status(real_status, force=force)

    @property
    def is_installed(self):
        """This is for compatibility - it may be removed during further
        cleanup.
        """
        return self.prepare_completed

    def set_status(self, status, force=False):
        """Use conductor to update the DB app status."""

        if force or self.is_installed:
            LOG.debug("Casting set_status message to conductor "
                      "(status is '%s')." % status.description)
            context = trove_context.TroveContext()

            heartbeat = {'service_status': status.description}
            conductor_api.API(context).heartbeat(
                CONF.guest_id, heartbeat, sent=timeutils.float_utcnow())
            LOG.debug("Successfully cast set_status.")
            self.status = status
        else:
            LOG.debug("Prepare has not completed yet, skipping heartbeat.")

配置

预计不会进行任何配置更改。

数据库

公共 API

公共 API 安全

Python API

CLI (python-troveclient)

内部 API

ServiceStatuses.BUILD_PENDING 状态已重命名为 ServiceStatuses.INSTANCE_READY,以更好地反映实例的实际状态。显示的值将保持为“BUILD”,以便不应有任何外部差异,从而保持向后兼容性。

Guest Agent

此更改不应影响 Guest Agent 上的任何行为。当前的测试应该足以确保该更改与代码库的其余部分完全兼容。

备选方案

有人建议使用 Nova 元数据来修复 prepare 问题,Nova 元数据在 guest 实例上可用。如果确定这将很有用,可以将其添加到建议的方法中,仅作为通知手段。

Dashboard 影响 (UX)

待定 (在批准后添加的部分)

实现

负责人

主要负责人

<peterstac>

里程碑

完成目标里程碑

例如 Mitaka-1

工作项

所有更改将在单个任务的上下文中完成。

升级影响

预计没有。

依赖项

测试

将根据需要修改单元测试,但在这方面将进行最少的更改。为了使更改尽可能小,重构测试也将分阶段进行,首先只完成最少的工作,其余的留到以后。未来的工作将包括彻底测试基础类,然后从派生类中删除所有相应的测试。

集成测试应像往常一样继续运行,并将用于确定没有对实现进行任何根本性更改,除了错误修复(它们应该只会导致测试基础设施的更大稳定性)。

文档影响

由于所有更改都与实现相关,因此预计不会进行任何文档更改。

附录