When feature names or class labels contain Chinese characters, tree visualizations generated by XGBoost, scikit-learn, and Graphviz may render them as boxes or missing glyphs. The usual fix is to set a font that supports Chinese before Graphviz renders the image.
The examples below use FangSong. You can replace it with any Chinese-capable font installed on your system, such as SimHei, Microsoft YaHei, Noto Sans CJK SC, or Source Han Sans SC.
Table of Contents
XGBoost
xgb.to_graphviz() returns a Graphviz object. Before rendering it, modify the Graphviz source and add a node font definition.
import re
import xgboost as xgb
def set_graph_font(graph, font_name="FangSong"):
graph.source = re.sub(
r"graph \[ rankdir=TB \]\n\n 0 ",
f'graph [ rankdir=TB ]\n\n node [fontname="{font_name}" shape=plaintext]\n\n 0 ',
graph.source,
)
return graph.source
diagraph = xgb.to_graphviz(model, num_trees=9)
diagraph.format = "png"
set_graph_font(diagraph)
diagraph
If the font still does not render, check the fonts available to Graphviz on the local machine. On Linux, fc-list is often the quickest way to verify whether the font is installed:
fc-list | grep -i "fang\|noto\|source han"
RandomForest
For scikit-learn’s RandomForestClassifier or RandomForestRegressor, export each tree to a DOT file, replace the default font with a Chinese-capable font, and then render the DOT file with Graphviz.
import re
from pathlib import Path
from subprocess import call
from sklearn.tree import export_graphviz
def plot_forest(model, column_names, output_dir="forest", font_name="FangSong"):
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
for i, estimator in enumerate(model.estimators_[:10]):
dot_path = output_path / f"tree-{i}.dot"
cn_dot_path = output_path / f"tree-cn-{i}.dot"
png_path = output_path / f"tree-cn-{i}.png"
export_graphviz(
estimator,
out_file=str(dot_path),
feature_names=column_names,
class_names=["Class Name 1", "Class Name 2"],
rounded=True,
proportion=False,
precision=2,
filled=True,
)
source = dot_path.read_text(encoding="utf-8")
source = re.sub(r"helvetica", font_name, source, flags=re.IGNORECASE)
cn_dot_path.write_text(source, encoding="utf-8")
# Convert to PNG using Graphviz.
call(["dot", "-Tpng", str(cn_dot_path), "-o", str(png_path), "-Gdpi=600"])
In a Jupyter notebook, display one rendered tree with:
from IPython.display import Image
Image(filename="forest/tree-cn-0.png")
Before running the conversion step, make sure Graphviz is installed and available on the command line:
dot -V
If dot is missing, install Graphviz with the package manager for your operating system, then rerun the notebook or script.
