Ansible的基本概念与架构：理解自动化配置管理的核心

老马啸西风2025/8/31大约 11 分钟

5.1 Ansible的基本概念与架构

要深入理解和有效使用Ansible，首先需要掌握其基本概念和架构设计。Ansible的简洁性和易用性很大程度上源于其精心设计的架构和清晰的概念模型。本节将详细介绍Ansible的核心概念、架构组件以及它们之间的关系。

Ansible的核心概念

1. 控制节点（Control Node）

控制节点是运行Ansible命令和Playbook的机器。它是Ansible自动化流程的起点和控制中心。

控制节点的要求

# 控制节点系统要求
# 操作系统: Linux/Unix/macOS (不支持Windows作为控制节点)
# Python版本: Python 2.7或Python 3.5+
# 网络连接: 能够通过SSH连接到被管节点

# 安装Ansible
# 使用pip安装
pip install ansible

# 使用包管理器安装 (Ubuntu/Debian)
sudo apt update
sudo apt install ansible

# 使用包管理器安装 (CentOS/RHEL)
sudo yum install epel-release
sudo yum install ansible

# 验证安装
ansible --version

控制节点的关键组件

# 控制节点核心组件
control_node_components = {
    "ansible_core": {
        "description": "Ansible核心引擎",
        "components": [
            "ansible-playbook",     # 执行Playbook
            "ansible",              # 执行临时命令
            "ansible-vault",        # 管理加密数据
            "ansible-galaxy",       # 管理Roles和Collections
            "ansible-doc"           # 查看文档
        ]
    },
    "configuration": {
        "description": "配置文件",
        "files": [
            "ansible.cfg",          # 主配置文件
            "inventory",            # 主机清单文件
            "group_vars/",          # 组变量目录
            "host_vars/"            # 主机变量目录
        ]
    },
    "content": {
        "description": "配置内容",
        "directories": [
            "playbooks/",           # Playbook目录
            "roles/",               # Roles目录
            "templates/",           # 模板目录
            "files/"                # 文件目录
        ]
    }
}

2. 被管节点（Managed Nodes）

被管节点是Ansible管理的目标机器，它们是配置管理的实际对象。

被管节点的要求

# 被管节点系统要求
# Python版本: Python 2.6+或Python 3.5+
# SSH服务: 必须运行SSH服务
# 网络连接: 能够从控制节点访问

# 在被管节点上安装Python (如果缺失)
# Ubuntu/Debian
sudo apt update
sudo apt install python3

# CentOS/RHEL
sudo yum install python3

被管节点的连接方式

# 不同的连接方式配置示例
all:
  hosts:
    # 标准SSH连接
    web01.example.com:
      ansible_connection: ssh
      ansible_user: ansible
      ansible_ssh_private_key_file: ~/.ssh/ansible_key
    
    # 本地连接 (用于控制节点自身)
    localhost:
      ansible_connection: local
    
    # Windows连接
    win01.example.com:
      ansible_connection: winrm
      ansible_user: Administrator
      ansible_password: "{{ vault_win_password }}"
    
    # 网络设备连接
    switch01.example.com:
      ansible_connection: network_cli
      ansible_network_os: ios
      ansible_user: network_admin

3. Inventory（主机清单）

Inventory定义了Ansible管理的主机和组，是Ansible工作的基础。

INI格式Inventory

# 简单的INI格式Inventory
# 文件: inventory/production

# 定义主机组
[webservers]
web01.example.com ansible_host=192.168.1.10
web02.example.com ansible_host=192.168.1.11

[databases]
db01.example.com ansible_host=192.168.1.20
db02.example.com ansible_host=192.168.1.21

# 定义子组
[production:children]
webservers
databases

# 定义组变量
[webservers:vars]
http_port=80
https_port=443

[databases:vars]
mysql_port=3306

# 定义全局变量
[all:vars]
ansible_user=ansible
ansible_ssh_private_key_file=~/.ssh/ansible_key

YAML格式Inventory

# YAML格式Inventory示例
# 文件: inventory/production.yml

all:
  vars:
    ansible_user: ansible
    ansible_ssh_private_key_file: ~/.ssh/ansible_key
  
  children:
    webservers:
      hosts:
        web01.example.com:
          ansible_host: 192.168.1.10
          http_port: 80
          https_port: 443
        web02.example.com:
          ansible_host: 192.168.1.11
          http_port: 80
          https_port: 443
      
      vars:
        app_name: webapp
        app_version: 1.2.3
    
    databases:
      hosts:
        db01.example.com:
          ansible_host: 192.168.1.20
          mysql_port: 3306
        db02.example.com:
          ansible_host: 192.168.1.21
          mysql_port: 3306
      
      vars:
        db_name: myapp
        db_user: myapp_user
    
    production:
      children:
        webservers:
        databases:

动态Inventory

# 动态Inventory脚本示例 (Python)
#!/usr/bin/env python3
import json
import subprocess

def get_ec2_instances():
    """从AWS EC2获取实例信息"""
    try:
        # 使用AWS CLI获取实例信息
        result = subprocess.run([
            'aws', 'ec2', 'describe-instances',
            '--filters', 'Name=instance-state-name,Values=running'
        ], capture_output=True, text=True)
        
        data = json.loads(result.stdout)
        inventory = {"_meta": {"hostvars": {}}}
        
        for reservation in data['Reservations']:
            for instance in reservation['Instances']:
                # 提取实例信息
                instance_id = instance['InstanceId']
                private_ip = instance.get('PrivateIpAddress', '')
                tags = {tag['Key']: tag['Value'] for tag in instance.get('Tags', [])}
                
                # 根据标签分组
                group = tags.get('Environment', 'ungrouped')
                if group not in inventory:
                    inventory[group] = {"hosts": []}
                inventory[group]["hosts"].append(instance_id)
                
                # 设置主机变量
                inventory["_meta"]["hostvars"][instance_id] = {
                    "ansible_host": private_ip,
                    "ec2_instance_id": instance_id,
                    "ec2_instance_type": instance['InstanceType'],
                    "ec2_tags": tags
                }
        
        return inventory
    except Exception as e:
        return {"_meta": {"hostvars": {}}, "error": str(e)}

if __name__ == "__main__":
    inventory = get_ec2_instances()
    print(json.dumps(inventory, indent=2))

4. Modules（模块）

Modules是Ansible执行具体任务的组件，每个模块负责特定的功能。

内置模块

# 常用内置模块示例
---
- name: Demonstrate common modules
  hosts: all
  tasks:
    # 包管理模块
    - name: Install packages
      yum:  # 或 apt, dnf, etc.
        name: 
          - nginx
          - python3
        state: present
    
    # 服务管理模块
    - name: Manage services
      service:
        name: nginx
        state: started
        enabled: yes
    
    # 文件管理模块
    - name: Manage files
      file:
        path: /etc/myapp
        state: directory
        owner: root
        group: root
        mode: '0755'
    
    # 模板模块
    - name: Deploy configuration from template
      template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        owner: root
        group: root
        mode: '0644'
      notify: restart nginx
    
    # 命令模块
    - name: Execute commands
      command: uptime
      register: uptime_result
    
    - name: Display uptime
      debug:
        msg: "System uptime: {{ uptime_result.stdout }}"

自定义模块

# 自定义模块示例
#!/usr/bin/python
# -*- coding: utf-8 -*-

# 文件: library/my_custom_module.py

from ansible.module_utils.basic import AnsibleModule
import requests

def main():
    # 定义模块参数
    module = AnsibleModule(
        argument_spec=dict(
            url=dict(required=True, type='str'),
            expected_status=dict(default=200, type='int'),
            timeout=dict(default=30, type='int'),
            method=dict(default='GET', choices=['GET', 'POST', 'PUT', 'DELETE']),
            headers=dict(type='dict', default={}),
            data=dict(type='str', default=None)
        ),
        supports_check_mode=True
    )
    
    # 获取参数
    url = module.params['url']
    expected_status = module.params['expected_status']
    timeout = module.params['timeout']
    method = module.params['method']
    headers = module.params['headers']
    data = module.params['data']
    
    # 检查模式处理
    if module.check_mode:
        module.exit_json(changed=False, msg="Check mode enabled")
    
    # 执行HTTP请求
    try:
        response = requests.request(
            method=method,
            url=url,
            headers=headers,
            data=data,
            timeout=timeout
        )
        
        # 检查响应状态
        if response.status_code == expected_status:
            module.exit_json(
                changed=False,
                msg=f"URL {url} returned expected status {expected_status}",
                status_code=response.status_code,
                response_body=response.text[:200]  # 限制输出长度
            )
        else:
            module.fail_json(
                msg=f"URL {url} returned status {response.status_code}, expected {expected_status}",
                status_code=response.status_code,
                response_body=response.text[:200]
            )
    
    except requests.exceptions.RequestException as e:
        module.fail_json(msg=f"Failed to check URL {url}: {str(e)}")

if __name__ == '__main__':
    main()

使用自定义模块：

# 使用自定义模块
---
- name: Use custom module
  hosts: localhost
  tasks:
    - name: Check website availability
      my_custom_module:
        url: https://example.com
        expected_status: 200
        timeout: 10
        headers:
          User-Agent: "Ansible/CustomModule"

5. Plugins（插件）

Plugins扩展Ansible的功能，包括连接插件、回调插件、过滤插件等。

连接插件

# 连接插件示例
class Connection:
    """自定义连接插件"""
    
    def __init__(self, play_context, new_stdin):
        self.play_context = play_context
        self.new_stdin = new_stdin
        self.connection = None
    
    def _connect(self):
        """建立连接"""
        if not self.connection:
            # 实现连接逻辑
            self.connection = self._create_connection()
    
    def exec_command(self, cmd, in_data=None, sudoable=True):
        """执行命令"""
        self._connect()
        # 执行命令逻辑
        return self.connection.exec_command(cmd, in_data, sudoable)
    
    def put_file(self, in_path, out_path):
        """传输文件"""
        self._connect()
        # 文件传输逻辑
        return self.connection.put_file(in_path, out_path)
    
    def fetch_file(self, in_path, out_path):
        """获取文件"""
        self._connect()
        # 文件获取逻辑
        return self.connection.fetch_file(in_path, out_path)
    
    def close(self):
        """关闭连接"""
        if self.connection:
            self.connection.close()
            self.connection = None

回调插件

# 回调插件示例
from ansible.plugins.callback import CallbackBase

class CallbackModule(CallbackBase):
    """自定义回调插件"""
    
    def __init__(self):
        super(CallbackModule, self).__init__()
        self.task_count = 0
        self.failed_tasks = []
    
    def v2_playbook_on_task_start(self, task, is_conditional):
        """任务开始时调用"""
        self.task_count += 1
        self._display.display(f"Task {self.task_count}: {task.get_name()}")
    
    def v2_runner_on_ok(self, result):
        """任务成功时调用"""
        host = result._host.get_name()
        self._display.display(f"✓ {host}: Task completed successfully")
    
    def v2_runner_on_failed(self, result, ignore_errors=False):
        """任务失败时调用"""
        host = result._host.get_name()
        task_name = result._task.get_name()
        self.failed_tasks.append({
            'host': host,
            'task': task_name,
            'result': result._result
        })
        self._display.display(f"✗ {host}: Task '{task_name}' failed")
    
    def v2_playbook_on_stats(self, stats):
        """Playbook结束时调用"""
        self._display.display(f"\nPlaybook completed. Total tasks: {self.task_count}")
        if self.failed_tasks:
            self._display.display(f"Failed tasks: {len(self.failed_tasks)}")

Ansible的架构设计

1. 无代理架构

Ansible采用无代理架构，这是其核心设计之一：

# 无代理架构的优势

## 传统有代理架构的缺点
1. 需要在每个被管节点安装代理软件
2. 代理软件需要定期更新和维护
3. 代理进程消耗系统资源
4. 增加了潜在的安全风险点

## Ansible无代理架构的优势
1. 零安装要求：无需在被管节点安装额外软件
2. 降低维护成本：无需管理代理软件生命周期
3. 减少资源消耗：被管节点无需运行额外进程
4. 提高安全性：减少攻击面
5. 简化部署：快速扩展到新节点

SSH连接机制

# SSH连接示例
import paramiko

def establish_ssh_connection(host, user, key_file):
    """建立SSH连接"""
    client = paramiko.SSHClient()
    client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    
    # 使用私钥连接
    private_key = paramiko.RSAKey.from_private_key_file(key_file)
    client.connect(
        hostname=host,
        username=user,
        pkey=private_key,
        timeout=30
    )
    
    return client

def execute_remote_command(ssh_client, command):
    """执行远程命令"""
    stdin, stdout, stderr = ssh_client.exec_command(command)
    return {
        'stdout': stdout.read().decode(),
        'stderr': stderr.read().decode(),
        'exit_status': stdout.channel.recv_exit_status()
    }

2. 推送式架构

Ansible采用推送式架构，控制节点主动向被管节点推送配置：

# 推送式架构 vs 拉取式架构

## 推送式架构 (Ansible)
[控制节点] ===推送配置===> [被管节点1]
[控制节点] ===推送配置===> [被管节点2]
[控制节点] ===推送配置===> [被管节点3]

优势:
- 实时控制：可以立即执行配置变更
- 集中管理：所有操作从控制节点发起
- 灵活性高：可以根据需要选择目标节点

## 拉取式架构 (Chef/Puppet)
[控制节点] <===拉取配置==== [被管节点1]
[控制节点] <===拉取配置==== [被管节点2]
[控制节点] <===拉取配置==== [被管节点3]

特点:
- 被管节点定期检查配置状态
- 被动更新：需要等待下次检查周期
- 自主性强：被管节点自主决定何时更新

3. 幂等性设计

Ansible的操作具有幂等性，确保同一配置可以多次执行而不会产生副作用：

# 幂等性实现示例
class IdempotentOperation:
    def __init__(self, target_state):
        self.target_state = target_state
        self.current_state = self.get_current_state()
    
    def get_current_state(self):
        """获取当前状态"""
        # 实现获取当前状态的逻辑
        pass
    
    def is_state_correct(self):
        """检查状态是否正确"""
        return self.current_state == self.target_state
    
    def apply_changes(self):
        """应用必要的变更"""
        if not self.is_state_correct():
            # 执行变更操作
            self.perform_action()
            return True
        return False
    
    def perform_action(self):
        """执行具体操作"""
        # 实现具体的操作逻辑
        pass

# Ansible模块中的幂等性实现
def ensure_package_installed(package_name, desired_state):
    """确保软件包处于期望状态"""
    current_state = get_package_state(package_name)
    
    if desired_state == 'present' and current_state != 'installed':
        install_package(package_name)
        return True
    elif desired_state == 'absent' and current_state == 'installed':
        remove_package(package_name)
        return True
    
    return False  # 状态已正确，无需变更

Ansible的工作流程

1. Playbook执行流程

# Playbook执行流程

1. 解析Playbook
   ├── 读取YAML文件
   ├── 验证语法
   └── 构建执行计划

2. 加载Inventory
   ├── 读取主机清单
   ├── 解析主机组
   └── 加载变量

3. 变量处理
   ├── 加载全局变量
   ├── 加载组变量
   ├── 加载主机变量
   └── 处理变量优先级

4. 任务执行
   ├── 按顺序执行任务
   ├── 处理条件判断
   ├── 执行模块调用
   └── 收集执行结果

5. 结果处理
   ├── 生成执行报告
   ├── 处理通知器
   └── 输出执行日志

详细执行流程

# Playbook执行流程示例
class PlaybookExecutor:
    def __init__(self, playbook_file, inventory_file):
        self.playbook_file = playbook_file
        self.inventory_file = inventory_file
        self.inventory = None
        self.playbook = None
    
    def execute(self):
        """执行Playbook"""
        # 1. 解析Playbook
        self.parse_playbook()
        
        # 2. 加载Inventory
        self.load_inventory()
        
        # 3. 处理变量
        self.process_variables()
        
        # 4. 执行任务
        self.execute_tasks()
        
        # 5. 生成报告
        self.generate_report()
    
    def parse_playbook(self):
        """解析Playbook"""
        import yaml
        with open(self.playbook_file, 'r') as f:
            self.playbook = yaml.safe_load(f)
        print(f"Parsed playbook: {self.playbook_file}")
    
    def load_inventory(self):
        """加载Inventory"""
        # 实现Inventory加载逻辑
        print(f"Loaded inventory: {self.inventory_file}")
    
    def process_variables(self):
        """处理变量"""
        # 实现变量处理逻辑
        print("Processed variables")
    
    def execute_tasks(self):
        """执行任务"""
        for play in self.playbook:
            print(f"Executing play: {play.get('name', 'unnamed')}")
            for task in play.get('tasks', []):
                self.execute_task(task)
    
    def execute_task(self, task):
        """执行单个任务"""
        task_name = task.get('name', 'unnamed task')
        print(f"  Executing task: {task_name}")
        # 实现任务执行逻辑
    
    def generate_report(self):
        """生成执行报告"""
        print("Playbook execution completed")
        # 实现报告生成逻辑

2. 模块执行机制

# 模块执行机制示例
class ModuleExecutor:
    def __init__(self, module_name, module_args, connection):
        self.module_name = module_name
        self.module_args = module_args
        self.connection = connection
    
    def execute(self):
        """执行模块"""
        # 1. 准备模块代码
        module_code = self.prepare_module_code()
        
        # 2. 传输模块到远程节点
        remote_module_path = self.transfer_module(module_code)
        
        # 3. 执行模块
        result = self.run_module(remote_module_path)
        
        # 4. 处理结果
        return self.process_result(result)
    
    def prepare_module_code(self):
        """准备模块代码"""
        # 获取模块代码并打包
        module_source = self.get_module_source()
        return self.package_module(module_source)
    
    def transfer_module(self, module_code):
        """传输模块到远程节点"""
        remote_path = f"/tmp/ansible_module_{self.module_name}"
        self.connection.put_file(module_code, remote_path)
        return remote_path
    
    def run_module(self, module_path):
        """运行模块"""
        command = f"python3 {module_path} '{json.dumps(self.module_args)}'"
        return self.connection.exec_command(command)
    
    def process_result(self, result):
        """处理执行结果"""
        try:
            return json.loads(result['stdout'])
        except json.JSONDecodeError:
            return {
                'failed': True,
                'msg': 'Module returned invalid JSON',
                'stdout': result['stdout'],
                'stderr': result['stderr']
            }

Ansible的扩展机制

1. 自定义模块开发

# 自定义模块开发框架
def create_ansible_module():
    """创建Ansible模块框架"""
    module_template = '''
#!/usr/bin/python
# -*- coding: utf-8 -*-

from ansible.module_utils.basic import AnsibleModule

def main():
    module = AnsibleModule(
        argument_spec=dict(
            # 定义模块参数
            name=dict(required=True, type='str'),
            state=dict(default='present', choices=['present', 'absent']),
        ),
        supports_check_mode=True
    )
    
    # 获取参数
    name = module.params['name']
    state = module.params['state']
    
    # 检查模式处理
    if module.check_mode:
        module.exit_json(changed=False)
    
    # 执行业务逻辑
    result = perform_action(name, state)
    
    # 返回结果
    if result['changed']:
        module.exit_json(changed=True, result=result)
    else:
        module.exit_json(changed=False, result=result)

def perform_action(name, state):
    """执行具体操作"""
    # 实现具体的业务逻辑
    return {'changed': False, 'message': 'No action performed'}

if __name__ == '__main__':
    main()
'''
    return module_template

2. 自定义插件开发

# 自定义插件开发示例
def create_custom_plugin(plugin_type):
    """创建自定义插件"""
    
    if plugin_type == 'callback':
        return '''
from ansible.plugins.callback import CallbackBase

class CallbackModule(CallbackBase):
    CALLBACK_VERSION = 2.0
    CALLBACK_TYPE = 'notification'
    CALLBACK_NAME = 'custom_callback'
    
    def __init__(self):
        super(CallbackModule, self).__init__()
    
    def v2_playbook_on_start(self, playbook):
        self._display.display("Playbook started: {}".format(playbook._file_name))
    
    def v2_playbook_on_stats(self, stats):
        self._display.display("Playbook completed")
'''
    
    elif plugin_type == 'filter':
        return '''
class FilterModule(object):
    def filters(self):
        return {
            'custom_filter': self.custom_filter
        }
    
    def custom_filter(self, value):
        # 实现自定义过滤器逻辑
        return value.upper()
'''
    
    return ""

Ansible的配置管理

1. 配置文件层次

# ansible.cfg配置文件层次结构
# 1. ANSIBLE_CONFIG环境变量指定的文件
# 2. 当前目录下的ansible.cfg
# 3. 用户主目录下的.ansible.cfg
# 4. /etc/ansible/ansible.cfg

# ansible.cfg示例
[defaults]
# Inventory设置
inventory = ./inventory
host_key_checking = False

# 连接设置
timeout = 30
remote_user = ansible
private_key_file = ~/.ssh/ansible_key

# 模块设置
module_name = command
forks = 50

# 回调设置
stdout_callback = yaml
bin_ansible_callbacks = True

# Vault设置
vault_password_file = ~/.ansible/vault_password

[ssh_connection]
# SSH连接优化
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
scp_if_ssh = True

[privilege_escalation]
# 权限提升设置
become = True
become_method = sudo
become_user = root

2. 变量优先级

# Ansible变量优先级（从低到高）

1.  命令行变量 (-e)
2.  任务中的vars定义
3.  Block中的vars定义
4.  Play中的vars定义
5.  Include_vars文件
6.  Role中的vars/main.yml
7.  Playbook目录中的vars文件
8.  Inventory文件中的组变量
9.  Inventory目录中的group_vars/all
10. Inventory目录中的group_vars/*
11. Inventory文件中的主机变量
12. Inventory目录中的host_vars/*
13. Facts发现的变量
14. Registered变量
15. Role中的defaults/main.yml
16. Extra variables (-e)

总结

Ansible的基本概念和架构设计体现了其简洁、高效和易用的特点。通过理解控制节点、被管节点、Inventory、Modules、Plugins等核心概念，以及无代理架构、推送式执行、幂等性设计等工作机制，我们可以更好地利用Ansible进行自动化配置管理。

在下一节中，我们将深入探讨Playbook的编写与执行，学习如何使用Playbook来定义和执行复杂的配置管理任务。