Project Background
Goal: Deploy NapCat (a QQ bot framework) using Docker on an Alibaba Cloud Linux server, and install the QQ Chat Exporter (QCE) plugin to export chat records.
Phase 1: Why Switch from Bare-Metal Deployment to Docker Deployment
Initial Problem: QR Code Spinning, Unable to Scan
Symptom: The NapCat page could be opened, but the QR code kept spinning and loading, preventing QQ login.
Root Cause: NapCat relies on QQ's Electron client at its core to display the login QR code. Electron is a desktop application framework that requires a graphics card (GPU) to render the interface. However, cloud servers are "headless" environments with no display or GPU, so the logs repeatedly reported:
GPU process crash → QQ's internal login UI process stuck → QR code data never generated → WebUI keeps spinning.
Why does switching to Docker solve this? The official NapCat Framework Docker image (mlikiowa/napcat-framework-docker) comes with Xvfb (a virtual display server) and NoVNC (a web-based remote desktop) built-in, specifically designed for server environments without a GPU.
Phase 2: Docker Deployment Process
Bug 1: Docker Compose Template Format Error
Symptom: Baota Panel reported "Template content format error".
Cause: Baota Panel has additional restrictions on Docker Compose syntax; add-hosts (the correct syntax should be extra_hosts), comments, quotes, etc., could all trigger parsing errors.
Solution: Abandon the Compose template and switch to using the pure command-line docker run for direct deployment, completely bypassing the panel's format validation.
Bug 2: Complete Network Disconnection Inside Container, Unable to Access Any External Network
Symptom: Executing curl https://www.baidu.com inside the container reported Could not resolve host, couldn't even reach Baidu, causing QQ login to fail with "Network connection failed".
Root Cause: This is a classic conflict between Alibaba Cloud + Docker + Firewalld.
When Docker runs, it needs to write two types of iptables rules at the operating system level:
- IP Forwarding Rules: Allow the server to forward the container's network packets out.
- NAT MASQUERADE Rules: Convert the container's internal IP to the server's public IP for sending.
However, Alibaba Cloud servers have firewalld (the system firewall) enabled by default. Every time firewall-cmd --reload is executed, it clears all iptables rules, including those injected by Docker itself. After clearing, the container becomes a "network-isolated island".
Fix Process:
Enable system IP forwarding (allow packet routing):
CodeBlock Loading...Make firewalld permanently allow NAT forwarding:
CodeBlock Loading...Restart Docker so it re-injects all network rules:
CodeBlock Loading...After restarting, SSL certificate files were found missing (
error setting certificate file), indicating the network was actually working, but the container lacked the CA certificate bundle. Install it:CodeBlock Loading...
Bug 3: QR Code Disappears Again After Container Restart, VNC Also Disconnects
Symptom: After every docker restart, all manual operations performed inside the container (mkdir, chmod, starting x11vnc, starting websockify) disappeared.
Root Cause: A Docker container's filesystem consists of "temporary layers". All modifications manually written inside the container are lost upon container restart. Only directories mounted from the host via -v are persisted.
This meant the previous startup command didn't mount all necessary directories, causing:
pluginsdirectory loss → plugins not loadedqce-v4-toolfrontend directory loss → webpage 404/.qq-chat-exporterdata directory loss → QCE plugin reports permission errors
Phase 3: Final Correct Deployment Command
Summarizing all lessons learned, the final complete startup command is:
Meaning of each parameter:
| Parameter | Purpose | Why it's needed |
|---|---|---|
--network host | Container directly uses the server's network | Completely solves NAT forwarding conflicts with Docker bridge network |
--dns=8.8.8.8 | Specify DNS server | Prevents DNS resolution failure inside the container |
-e VNC_PASSWD=... | Set VNC remote desktop password | Need to operate via VNC for the first-time QQ login scan |
-v qqdata:/app/.config/QQ | Persist QQ login state | No need to re-scan QR code after container restart |
-v plugins:/app/napcat/plugins | Persist plugin directory | Plugins don't disappear after restart |
-v qce-v4-tool:/app/napcat/static/qce-v4-tool | Persist QCE frontend files | Webpage doesn't 404 after restart |
-v qce-data:/.qq-chat-exporter | Persist QCE data directory | Solves permission issues + exported files aren't lost |
Phase 4: QCE Plugin Error Fix
Bug 4: QCE Plugin Reports Permission Error on Startup
Cause: The QCE plugin needs to create a data folder under the container's root directory /, but the process inside the container doesn't have permission to write files to the root /.
Solution: Create this directory on the host machine in advance and grant permissions, then mount it into the container via -v. This way, the directory already exists with write permissions when the container starts, and the plugin doesn't need to create it itself:
Phase 5: First-Time Login Process
After the container starts, the VNC service is not automatically exposed externally; it needs to be started manually:
Then, access http://ServerPublicIP:6081 in a browser, enter the VNC password, and you'll see the QQ login interface. Use your phone to scan the QR code to log in.
After successful login, this step only needs to be done once. Because the QQ data is mounted to the host, future container restarts will automatically maintain the login state.
Overall Root Cause Summary
Looking back at the entire process, all problems can be traced back to three fundamental contradictions:
Contradiction 1: Graphical Application vs. Headless Server QQ is a desktop application that relies on GPU rendering, but cloud servers have no graphics card. The solution is to use Xvfb to simulate a virtual display.
Contradiction 2: Container Network vs. System Firewall Docker needs to control iptables for NAT forwarding, but firewalld periodically clears these rules. The solution is to use --network host to let the container directly use the host's network, completely bypassing this conflict.
This is a fairly common problem, but not all servers encounter it; it depends on the combination of the operating system and cloud provider's default configurations.
Specifically, you encountered this problem due to the overlap of the following factors:
Your Server's Specific Situation (Alibaba Cloud + CentOS/Alibaba Cloud Linux) Alibaba Cloud's Linux images have firewalld enabled by default, and the CentOS/RHEL family (including Alibaba Cloud Linux) also has firewalld enabled by default. When firewalld and Docker both manage iptables, conflicts easily arise. This combination is very common on domestic cloud providers (Alibaba Cloud, Tencent Cloud, Huawei Cloud), so many people actually encounter this problem.
Situations Where This Problem Won't Occur If the server uses Ubuntu, which uses ufw by default instead of firewalld, the coexistence method between Docker and ufw is different; it usually doesn't clear Docker's iptables rules. Therefore, deploying Docker on Ubuntu generally doesn't require these extra steps. Most Docker tutorials assume readers use Ubuntu, which is why many tutorials don't mention this problem at all.
Conclusion This problem is not unique to your server, but it's also not present on all servers—it's a common pitfall under the specific combination of "Alibaba Cloud + CentOS family + Docker + firewalld". If you use an Ubuntu image to start a server next time, Docker deployment will be much smoother.
Contradiction 3: Container Ephemerality vs. Data Persistence Needs All modifications are lost after a container restart. It's necessary to map all directories that need to be persisted (QQ data, plugins, configuration, exported files) to the host machine via -v mounts.