Mortal(凡夫) 应该是开源麻将AI中效果比较好的模型了,但是作者没有提供足够详细的文档,以下是个人对Mortal的初步上手笔记
Mortal(凡夫)是基于深度强化学习的日本麻将(立直麻将)AI。项目核心由两大部分组成:
| 组件 | 语言 | 作用 |
|---|---|---|
| libriichi | Rust (通过 PyO3 暴露 Python 接口) | 高性能麻将引擎:状态机、数据集加载/解析、竞技场自对弈、mjai 协议处理、向听/和了计算等 |
| mortal | Python (PyTorch) | 神经网络模型定义、训练脚本、推理引擎、在线训练的 server/client |
训练分为三个递进阶段:
git clone https://github.com/Equim-chan/Mortal.git
cd Mortal
以下将此目录称为 $MORTAL_ROOT。
cargo build -p libriichi --lib --release
然后将编译产物复制到 mortal/ 目录下:
Linux:
cp target/release/libriichi.so mortal/libriichi.so
Windows (MSYS2 / Git Bash):
cp target/release/riichi.dll mortal/libriichi.pyd
cd mortal
python -c "import libriichi; help(libriichi)"
若一切正常,应输出 libriichi 的帮助信息。
训练数据为 gzip 压缩的 JSON Lines 文件(扩展名 .json.gz)。每个文件包含一场完整半庄(hanchan)的全部事件。解压后,每行是一个 JSON 对象,遵循 mjai 协议。
文件结构示例(解压后内容):
{"type":"start_game","names":["player1","player2","player3","player4"]}
{"type":"start_kyoku","bakaze":"E","dora_marker":"2p","kyoku":1,"honba":0,"kyotaku":0,"oya":0,"scores":[25000,25000,25000,25000],"tehais":[["1m","3m","5m","7m","9m","1p","3p","5p","7p","9p","1s","3s","5s"],["?","?","?","?","?","?","?","?","?","?","?","?","?"],["?","?","?","?","?","?","?","?","?","?","?","?","?"],["?","?","?","?","?","?","?","?","?","?","?","?","?"]]}
{"type":"tsumo","actor":0,"pai":"5m"}
{"type":"dahai","actor":0,"pai":"1m","tsumogiri":false}
{"type":"tsumo","actor":1,"pai":"?"}
{"type":"dahai","actor":1,"pai":"N","tsumogiri":true}
...
{"type":"reach","actor":0}
{"type":"dahai","actor":0,"pai":"9p","tsumogiri":false}
{"type":"reach_accepted","actor":0}
...
{"type":"hora","actor":0,"target":2,"pai":"3p","deltas":[8000,-4000,-4000,0]}
{"type":"end_kyoku"}
...
{"type":"end_game"}
详细信息可以参考mjai协议相关文档
训练 Mortal 需要大量高质量的日本麻将牌谱。以下是获取数据的主要途径:
.mjson 文件(实为 gzip 压缩的 JSON Lines),但2025年以后的数据似乎是原始mjai,没有被压缩。.json.gz 或在配置中调整 glob 匹配模式步骤一:下载天凤原始牌谱 (mjlog XML 格式)
推荐工具:
Apricot-S/houou-logs(最推荐)
专为下载天凤凤凰桌牌谱设计,Python 工具
安装:
git clone https://github.com/Apricot-S/houou-logs.git
pip install .
使用流程:
# 1. 导入历史牌谱 ID(从年度存档)
hhouou-logs import <db-path> <archive-path>
# 2. 下载牌谱内容
houou-logs download <db-path> [--players <PLAYERS>] [--length <LENGTH>] [--limit <LIMIT>]
重要限制:天凤只允许单会话下载,请勿多开,否则可能被封禁
具体用法请参考github页面
步骤二:将 mjlog XML 转换为 mjai JSON
fstqwq/mjlog2mjai(纯 Python,推荐)
无外部依赖,直接将 mjlog XML 转为 mjai JSON
用法:
from mjlog2mjai import parse_mjlog_to_mjai, load_mjlog
result = parse_mjlog_to_mjai(load_mjlog("game.mjlog"))
h11r03/mjlog2mjai(Python + Ruby)
python batch_convert_mjlog.py input_dir output_dir步骤三:压缩为 .json.gz
# 单个文件压缩
gzip game.json
# 批量压缩目录下所有 .json 文件
find dataset/ -name "*.json" -exec gzip {} \;
建议按年份/月份组织数据目录,便于分批训练和管理:
dataset/
├── 2019/
│ ├── 01/
│ │ ├── 2019010100gm-0009-xxxxx.json.gz
│ │ ├── 2019010100gm-0009-yyyyy.json.gz
│ │ └── ...
│ ├── 02/
│ │ └── ...
│ └── ...
├── 2020/
│ └── ...
├── 2021/
│ └── ...
├── 2022/
│ └── ...
├── 2023/
│ └── ...
└── 2024/
└── ...
对应配置中的 glob 模式:
# 使用所有数据
globs = ['dataset/**/*.json.gz']
# 或按年份指定
globs = [
'dataset/2019/**/*.json.gz',
'dataset/2020/**/*.json.gz',
'dataset/2021/**/*.json.gz',
]
训练通过 TOML 格式的配置文件控制。配置文件路径由环境变量 MORTAL_CFG 指定,默认为工作目录下的 config.toml。
加载逻辑位于 mortal/config.py:
config_file = os.environ.get('MORTAL_CFG', 'config.toml')
with open(config_file, encoding='utf-8') as f:
config = toml.load(f)
项目提供了完整的配置示例文件 mortal/config.example.toml,其中的值多为占位符,需要根据实际需求调整。
以下是所有配置项的说明:
[control] — 训练控制| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
version |
int | 4 |
模型版本(1/2/3/4),影响观测形状和网络结构,推荐使用 4 |
online |
bool | false |
false = 离线训练(从牌谱学习),true = 在线训练(自对弈) |
state_file |
str | 'mortal.pth' |
模型存档路径,训练中断后可从此恢复 |
best_state_file |
str | 'best.pth' |
测试表现最佳时的模型存档 |
tensorboard_dir |
str | 'tb_log' |
TensorBoard 日志目录 |
device |
str | 'cuda:0' |
训练设备 |
enable_cudnn_benchmark |
bool | false |
cuDNN benchmark 模式 |
enable_amp |
bool | false |
混合精度训练 |
enable_compile |
bool | false |
torch.compile 加速 |
batch_size |
int | 512 |
训练批大小 |
opt_step_every |
int | 1 |
每 N 步执行一次优化器更新(梯度累积) |
save_every |
int | 400 |
每 N 步保存一次模型 |
test_every |
int | 20000 |
每 N 步执行一次测试对局 |
submit_every |
int | 400 |
在线模式下,每 N 步提交参数到 server |
[dataset] — 数据集配置| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
globs |
list[str] | ['dataset/**/*.json.gz'] |
数据文件 glob 模式 |
file_index |
str | 'file_index.pth' |
文件索引缓存路径(首次构建后复用) |
file_batch_size |
int | 15 |
每批加载的文件数(越大内存占用越高) |
reserve_ratio |
float | 0.0 |
保留缓冲区比例(用于数据重放) |
num_workers |
int | 1 |
DataLoader 工作进程数 |
player_names_files |
list[str] | [] |
玩家名过滤文件列表(每文件一行一名) |
num_epochs |
int | 1 |
数据遍历轮数 |
enable_augmentation |
bool | false |
是否启用数据增强(牌面交换) |
augmented_first |
bool | false |
先使用增强数据还是原始数据 |
[env] — 强化学习环境| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
gamma |
float | 1 |
折扣因子(1 = 不折扣,即整局奖励) |
pts |
list[float] | [6.0, 4.0, 2.0, 0.0] |
1~4 名的名次点映射(用于奖励计算) |
[resnet] — ResNet 骨干网络| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
conv_channels |
int | 192 |
卷积通道数(越大表达能力越强,但训练越慢) |
num_blocks |
int | 40 |
残差块数量(深度) |
[cql] — Conservative Q-Learning| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
min_q_weight |
float | 5 |
CQL 正则化权重(仅离线训练生效;越大越保守) |
[aux] — 辅助任务| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
next_rank_weight |
float | 0.2 |
名次预测辅助损失的权重 |
[freeze_bn] — BatchNorm 冻结| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
mortal |
bool | false |
离线训练设 false,在线训练设 true |
[optim] — 优化器 (AdamW)| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
eps |
float | 1e-8 |
Adam epsilon |
betas |
list[float] | [0.9, 0.999] |
Adam betas |
weight_decay |
float | 0.1 |
权重衰减(仅 Linear/Conv1d 的 weight 参数) |
max_grad_norm |
float | 0 |
梯度裁剪范数(0 = 不裁剪) |
[optim.scheduler] — 学习率调度器| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
peak |
float | 1e-4 |
峰值学习率 |
final |
float | 1e-4 |
最终学习率 |
warm_up_steps |
int | 0 |
线性预热步数 |
max_steps |
int | 0 |
余弦退火总步数(0 = 恒定学习率) |
调度策略:线性预热 -> 余弦退火,实现在 mortal/lr_scheduler.py。
[baseline.train] / [baseline.test] — 对手模型| 参数 | 类型 | 说明 |
|---|---|---|
device |
str | 对手模型运行设备 |
enable_compile |
bool | 是否编译 |
state_file |
str | 对手模型的权重文件路径 |
[train_play.*] — 自对弈配置| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
games |
int | 800 |
自对弈对局数 |
log_dir |
str | 'train_play' |
日志输出目录 |
boltzmann_epsilon |
float | 0.005 |
epsilon-greedy 中探索概率 |
boltzmann_temp |
float | 0.05 |
Boltzmann 采样温度 |
top_p |
float | 1.0 |
top-p 采样截断 |
repeats |
int | 1 |
每个种子重复次数 |
[test_play] — 测试对局配置| 参数 | 类型 | 示例值 | 说明 |
|---|---|---|---|
games |
int | 3000 |
测试对局数(需为 4 的倍数) |
log_dir |
str | 'test_play' |
日志输出目录 |
[online] / [online.remote] / [online.server] — 在线训练| 配置 | 参数 | 说明 |
|---|---|---|
online.history_window |
int | 历史窗口大小 |
online.remote.host |
str | server 地址 |
online.remote.port |
int | server 端口 |
online.server.buffer_dir |
str | 缓冲区目录 |
online.server.drain_dir |
str | 排放目录 |
online.server.capacity |
int | 缓冲区容量 |
online.server.force_sequential |
bool | 是否强制顺序 |
[1v3] — 1v3 评估| 参数 | 类型 | 说明 |
|---|---|---|
seed_key |
int | 随机种子 |
games_per_iter |
int | 每轮对局数 |
iters |
int | 迭代轮数 |
log_dir |
str | 日志目录 |
challenger.* |
— | 挑战者模型配置 |
champion.* |
— | 冠军模型配置 |
[grp] — GRP 训练专属配置| 配置 | 参数 | 说明 |
|---|---|---|
grp.state_file |
str | GRP 模型存档路径 |
grp.network.hidden_size |
int | GRU 隐藏层大小(默认 64) |
grp.network.num_layers |
int | GRU 层数(默认 2) |
grp.control.* |
— | 训练控制(device, batch_size, save_every 等) |
grp.dataset.train_globs |
list | 训练集 glob |
grp.dataset.val_globs |
list | 验证集 glob |
grp.dataset.file_index |
str | 文件索引缓存 |
grp.dataset.file_batch_size |
int | 文件批大小 |
grp.optim.lr |
float | 学习率 |
GRP 是奖励计算的基础组件,必须先于主训练完成。
在 config.toml 中添加以下配置:
[grp]
state_file = 'grp.pth'
[grp.network]
hidden_size = 64
num_layers = 2
[grp.control]
device = 'cuda:0'
enable_cudnn_benchmark = false
tensorboard_dir = 'grp_log'
batch_size = 512
save_every = 2000
val_steps = 400
[grp.dataset]
train_globs = [
'dataset/2019/**/*.json.gz',
'dataset/2020/**/*.json.gz',
'dataset/2021/**/*.json.gz',
]
val_globs = [
'dataset/2022/01/**/*.json.gz',
'dataset/2022/02/**/*.json.gz',
]
file_index = 'grp_file_index.pth'
file_batch_size = 50
[grp.optim]
lr = 1e-5
说明:
file_batch_size 控制每次加载的文件数,50 是合理值save_every = 2000 表示每 2000 步保存一次cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python train_grp.py
Windows PowerShell:
cd $MORTAL_ROOT\mortal
$env:MORTAL_CFG="config.toml"; python train_grp.py
save_every 步保存模型并在验证集上评估loss 和 acc 曲线grp.pth(后续主训练需要用到)tensorboard --logdir grp_log
观察 loss/train、loss/val、acc/train、acc/val 指标。
使用已有的天凤牌谱进行离线强化学习训练。这是训练 Mortal 的核心阶段。
[control]
version = 4
online = false
state_file = 'mortal.pth'
best_state_file = 'best.pth'
tensorboard_dir = 'tb_log'
device = 'cuda:0'
enable_cudnn_benchmark = false
enable_amp = false
enable_compile = false
batch_size = 512
opt_step_every = 1
save_every = 400
test_every = 20000
submit_every = 400
[dataset]
globs = ['dataset/**/*.json.gz']
file_index = 'file_index.pth'
file_batch_size = 15
reserve_ratio = 0.0
num_workers = 1
player_names_files = []
num_epochs = 1
enable_augmentation = false
augmented_first = false
[env]
gamma = 1
pts = [6.0, 4.0, 2.0, 0.0]
[resnet]
conv_channels = 192
num_blocks = 40
[cql]
min_q_weight = 5
[aux]
next_rank_weight = 0.2
[freeze_bn]
mortal = false
[optim]
eps = 1e-8
betas = [0.9, 0.999]
weight_decay = 0.1
max_grad_norm = 0
[optim.scheduler]
peak = 1e-4
final = 1e-4
warm_up_steps = 0
max_steps = 0
[train_play.default]
games = 800
log_dir = 'train_play'
boltzmann_epsilon = 0.005
boltzmann_temp = 0.05
top_p = 1.0
repeats = 1
[test_play]
games = 3000
log_dir = 'test_play'
[baseline.train]
device = 'cuda:0'
enable_compile = false
state_file = 'baseline.pth'
[baseline.test]
device = 'cuda:0'
enable_compile = false
state_file = 'baseline.pth'
# GRP 配置(主训练需要加载已训练的 GRP)
[grp]
state_file = 'grp.pth'
[grp.network]
hidden_size = 64
num_layers = 2
首次训练时没有 baseline 模型用于 test_play,需要进行 bootstrap, 在这个讨论中有提及。
方法一:设置极大的 test_every 值跳过测试
test_every = 999999999
先训练一段时间(如几千步),然后将生成的 mortal.pth 复制为 baseline.pth,将 test_every 改回正常值,重新开始训练。
方法二:手动生成随机初始化模型
# mortal/bootstrap.py
import os
import torch
from datetime import datetime
from torch.amp import GradScaler
from torch import optim
from model import Brain, DQN, AuxNet
from lr_scheduler import LinearWarmUpCosineAnnealingLR
from config import config
os.makedirs('checkpoints', exist_ok=True)
version = config['control']['version']
device = torch.device('cpu')
mortal = Brain(version=version, **config['resnet'])
dqn = DQN(version=version)
aux_net = AuxNet((4,))
# 需要构造完整的 optimizer/scheduler/scaler state,
# 因为 train.py 加载 state_file 时会恢复它们
all_models = (mortal, dqn, aux_net)
from torch import nn
decay_params = []
no_decay_params = []
for model in all_models:
params_dict = {}
to_decay = set()
for mod_name, mod in model.named_modules():
for name, param in mod.named_parameters(prefix=mod_name, recurse=False):
params_dict[name] = param
if isinstance(mod, (nn.Linear, nn.Conv1d)) and name.endswith('weight'):
to_decay.add(name)
decay_params.extend(params_dict[name] for name in sorted(to_decay))
no_decay_params.extend(params_dict[name] for name in sorted(params_dict.keys() - to_decay))
param_groups = [
{'params': decay_params, 'weight_decay': config['optim']['weight_decay']},
{'params': no_decay_params},
]
optimizer = optim.AdamW(
param_groups, lr=1, weight_decay=0,
betas=config['optim']['betas'], eps=config['optim']['eps'],
)
scheduler = LinearWarmUpCosineAnnealingLR(optimizer, **config['optim']['scheduler'])
scaler = GradScaler(device.type, enabled=False)
state = {
'mortal': mortal.state_dict(),
'current_dqn': dqn.state_dict(),
'aux_net': aux_net.state_dict(),
'optimizer': optimizer.state_dict(),
'scheduler': scheduler.state_dict(),
'scaler': scaler.state_dict(),
'steps': 0,
'timestamp': datetime.now().timestamp(),
'best_perf': {'avg_rank': 4., 'avg_pt': -135.},
'config': config,
}
torch.save(state, 'checkpoints/baseline.pth')
print('baseline.pth created (random weights)')
然后运行:
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python bootstrap.py
Windows PowerShell:
cd $MORTAL_ROOT\mortal
$env:MORTAL_CFG="config.toml"; python bootstrap.py
会生成一个随机出牌的模型,这样 checkpoints/baseline.pth 就有了完整的结构。TestPlayer 能正常加载它,等后续训练一段时间后,将baseline替换为训练后的最佳模型,重复。
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python train.py
Windows PowerShell:
cd $MORTAL_ROOT\mortal
$env:MORTAL_CFG="config.toml"; python train.py
训练的总损失由三部分组成:
total_loss = dqn_loss + cql_loss × min_q_weight + next_rank_loss × next_rank_weight
DQN Loss — 值函数学习
q_target_mc = gamma^steps_to_done × kyoku_reward
dqn_loss = 0.5 × MSE(q_predicted, q_target_mc)
使用蒙特卡洛 Q 目标:从当前步到局末的折扣奖励。
CQL Loss — 保守正则化(仅离线训练)
cql_loss = logsumexp(q_all_actions) - mean(q_selected_action)
防止 Q 值在离线数据未覆盖的动作上过度估计。
Aux Loss — 辅助名次预测
aux_loss = CrossEntropy(predicted_rank, actual_rank)
帮助特征编码器学习名次相关信息。
tensorboard --logdir tb_log
关键监控指标:
loss/dqn_loss:DQN 损失,应逐步下降loss/cql_loss:CQL 正则化损失loss/next_rank_loss:辅助任务损失q_predicted / q_target:Q 值分布直方图test_play/avg_ranking:测试对局平均名次(越低越好,2.5 是随机水平)test_play/avg_pt:测试对局平均 PTtest_play/ranking:各名次比例test_play/behavior:和牌率、放铳率、副露率、立直率为训出更强的模型,推荐多轮迭代:
best.pth) 设为新的 baseline.pth在线训练通过自对弈生成新数据来进一步提升模型。这是可选阶段,但推荐用于训练出最强模型。
在线训练采用三进程协作架构:
工作流程:
相比离线训练,在线训练需要修改以下配置:
[control]
online = true
[freeze_bn]
mortal = true # 在线训练必须冻结 BN
[online]
history_window = 50
enable_compile = false
[online.remote]
host = '127.0.0.1'
port = 5000
[online.server]
buffer_dir = 'buffer'
drain_dir = 'drain'
sample_reuse_rate = 0
sample_reuse_threshold = 0
capacity = 1600
force_sequential = false
在三个终端中依次启动:
# 终端 1: 启动 Server
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python server.py
# 终端 2: 启动 Trainer
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python train.py
# 终端 3: 启动 Client (可多开以加速数据生成)
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python client.py
Windows PowerShell:
# 终端 1: 启动 Server
cd $MORTAL_ROOT/mortal
$env:MORTAL_CFG="config.toml"; python server.py
# 终端 2: 启动 Trainer
cd $MORTAL_ROOT/mortal
$env:MORTAL_CFG="config.toml"; python train.py
# 终端 3: 启动 Client (可多开以加速数据生成)
cd $MORTAL_ROOT/mortal
$env:MORTAL_CFG="config.toml"; python client.py
使用 mortal/mortal.py 进行 mjai 协议的流式推理。
cd $MORTAL_ROOT/mortal
MORTAL_CFG=config.toml python mortal.py <player_id>
<player_id>: 玩家 ID,0-3 之间的整数Review 模式(用于牌谱分析):
MORTAL_REVIEW_MODE=1 MORTAL_CFG=config.toml python mortal.py <player_id>
Review 模式下,每步都会输出反应(包含 Q 值、掩码等 metadata),最后输出 GRP 名次矩阵。
在 Python 代码中直接使用:
import torch
from model import Brain, DQN
from engine import MortalEngine
# 加载模型
state = torch.load('mortal.pth', weights_only=False, map_location='cpu')
cfg = state['config']
version = cfg['control']['version']
conv_channels = cfg['resnet']['conv_channels']
num_blocks = cfg['resnet']['num_blocks']
# 构建网络
brain = Brain(version=version, conv_channels=conv_channels, num_blocks=num_blocks).eval()
dqn = DQN(version=version).eval()
brain.load_state_dict(state['mortal'])
dqn.load_state_dict(state['current_dqn'])
# 创建推理引擎
engine = MortalEngine(
brain,
dqn,
version=version,
is_oracle=False,
device=torch.device('cuda:0'),
enable_amp=True,
enable_rule_based_agari_guard=True,
name='mortal',
)
# 通过 Bot 接口使用 mjai 协议
from libriichi.mjai import Bot
bot = Bot(engine, player_id=0)
# 对每行 mjai 事件调用 react
testdata = '''
{"type":"start_game","names":["player1","player2","player3","player4"],"kyoku_first":0,"aka_flag":true}
{"type":"start_kyoku","bakaze":"E","dora_marker":"2m","kyoku":1,"honba":0,"kyotaku":0,"oya":0,"scores":[25000,25000,25000,25000],"tehais":[["1m","2m","5m","9m","3p","4p","6p","7p","7p","2s","6s","E","P"],["1m","3m","4m","7m","5p","5p","5p","4s","5sr","7s","9s","N","F"],["3m","5mr","6m","8m","4p","8p","1s","2s","4s","E","S","W","P"],["4m","2p","3p","8p","9p","2s","3s","6s","7s","9s","9s","9s","E"]]}
{"type":"tsumo","actor":0,"pai":"C"}
{"type":"dahai","actor":0,"pai":"9m","tsumogiri":false}
{"type":"tsumo","actor":1,"pai":"6s"}
{"type":"dahai","actor":1,"pai":"N","tsumogiri":false}
{"type":"tsumo","actor":2,"pai":"3p"}
{"type":"dahai","actor":2,"pai":"S","tsumogiri":false}
{"type":"tsumo","actor":3,"pai":"6p"}
{"type":"dahai","actor":3,"pai":"E","tsumogiri":false}
{"type":"tsumo","actor":0,"pai":"1s"}
{"type":"dahai","actor":0,"pai":"E","tsumogiri":false}
{"type":"tsumo","actor":1,"pai":"4p"}
{"type":"dahai","actor":1,"pai":"F","tsumogiri":false}
{"type":"tsumo","actor":2,"pai":"7p"}
{"type":"dahai","actor":2,"pai":"E","tsumogiri":false}
{"type":"tsumo","actor":3,"pai":"6p"}
{"type":"dahai","actor":3,"pai":"9p","tsumogiri":false}
{"type":"tsumo","actor":0,"pai":"9m"}
'''
testlines = testdata.split('\n')
print(testlines)
for line in testlines:
if len(line) == 0:
continue
print("->"+line)
reaction = bot.react(line)
if reaction:
print("<-"+reaction)