World Creation Performance Research (Personal Worlds)
TL;DR
This document investigates personal world initialization time under different concurrency scenarios and evaluates whether protocol/message-level optimizations are worthwhile.
Key takeaways:
- Baseline: 1 world init completes in ~36s on local host.
- Baseline concurrency issue: 5 worlds initialized “at the same time” show cumulative delays (48s → 187s).
- Optimization: batching (
SetChunkVerifiers) dramatically reduces on-chain message count and gas overhead. - After batching: 10 concurrent world inits complete consistently in ~29–37s (avg ~35s).
1) Background and Goal
Personal world initialization includes on-chain calls (via Gno RPC) plus bridge interactions. When multiple worlds initialize concurrently, the system appeared to serialize or bottleneck somewhere, causing later worlds to finish much later than earlier ones.
Goal
- Measure initialization duration for:
- 1 world (baseline)
- 5 worlds in parallel (baseline concurrency behavior)
- Identify an optimization that reduces bottlenecks (message count / gas / parsing overhead).
- Re-run concurrency benchmarks after applying the optimization.
2) Test Environment (Baseline Benchmarks)
- Date: 2026-02-05
- Execution: host (local machine)
- Gno RPC:
http://localhost:26657 - Bridge:
http://localhost:3020 - Biome:
verdant_hollowhttps://storage.googleapis.com/archainia-mvp.firebasestorage.app/development%2Fworld%2Fpersonal%2F45a199900102e8c445a199900102e8c4%2Fbiome.json
Metric definition
- “Duration” = (completedAt - createdAt) until the world reaches
COMPLETED.
3) Experiment A — Baseline Initialization Benchmarks
A-1. Single world initialization
| worldId | status | createdAt | completedAt | Duration |
|---|---|---|---|---|
| 8 | COMPLETED | 08:40:35.245 | 08:41:11.643 | 36s |
A-2. 5 worlds initialized concurrently
Procedure:
- Create 5 worlds individually.
- Send the init request for all 5 worlds at the same time.
| worldId | status | createdAt | completedAt | Duration |
|---|---|---|---|---|
| 12 | COMPLETED | 08:44:50.689 | 08:45:39.563 | 48s |
| 11 | COMPLETED | 08:44:50.733 | 08:46:18.696 | 87s |
| 10 | COMPLETED | 08:44:50.711 | 08:46:51.834 | 121s |
| 9 | COMPLETED | 08:44:50.768 | 08:47:24.948 | 154s |
| 13 | COMPLETED | 08:44:50.797 | 08:47:58.130 | 187s |
Observation
- Even though init requests are sent concurrently, completion times drift significantly.
- This suggests a bottleneck (or partial serialization) in message processing, RPC throughput, bridge handling, or on-chain execution costs.
4) Experiment B — “Parameter Readline” / Parsing & Gas Research
B-1. Hypothesis
If we avoid strings.Split(str, ",") and instead scan the string directly (handling delimiters as we encounter them), we may reduce gas usage by avoiding extra slice allocations and intermediate strings.
B-2. Implementation idea
- Implement
SetChunkVerifiersusing a single-pass scan parser (nostrings.Split). - Prefer batch processing over repeated individual calls.
B-3. Measurements
SetChunkVerifiers (batch, single-pass scan parsing)
| Items | Keys length (chars) | Vals length (chars) | Gas used | Gas / item | Storage delta |
|---|---|---|---|---|---|
| 3 | 11 | 194 | 5,512,197 | 1,837,399 | 1,905 bytes |
| 10 | 39 | 649 | 6,912,890 | 691,289 | 3,263 bytes |
| 25 | 114 | 1,624 | 9,956,000 | 398,240 | 6,188 bytes |
| 50 | 239 | 3,249 | 15,027,860 | 300,557 | 11,063 bytes |
| 100 | 489 | 6,499 | 25,171,592 | 251,716 | 20,815 bytes |
SetChunkVerifier (single item per call)
| Items | Gas used | Gas / item |
|---|---|---|
| 1 | 4,914,970 | 4,914,970 |
B-4. Comparative analysis
For 100 items:
| Approach | Total gas | Relative |
|---|---|---|
Batch (SetChunkVerifiers) | 25,171,592 | 1.0x |
100 individual calls (SetChunkVerifier × 100) | 491,497,000 (estimated) | 19.5x |
Result: batching reduces total gas by ~95% (for 100 items), mainly by removing per-message overhead.
B-5. Gas growth per additional item (batch mode)
| Range | Added items | Incremental gas per item |
|---|---|---|
| 3 → 10 | 7 | 200,099 |
| 10 → 25 | 15 | 202,874 |
| 25 → 50 | 25 | 202,874 |
| 50 → 100 | 50 | 202,875 |
Conclusion: gas increases by approximately ~200,000 gas per item, showing a stable linear relationship.
B-6. Outcome
- Batching is the dominant win: ~95% gas reduction vs per-item calls.
- Single-pass scan parsing removes additional allocation overhead compared to
strings.Split. - Predictable scaling: per-item incremental gas is ~200k.
- Practical batch size: 50–100 items is efficient in terms of gas/item.
5) Experiment C — Benchmark After Applying Batch Messages
C-1. Change summary
- Replace individual
SetChunkVerifiermessages with batchSetChunkVerifiers. - Batch size: 100
- Chunks: 249 total → 3 batch messages (100 + 100 + 49)
C-2. Test environment
- Date: 2026-02-05
- Execution: host (local machine)
- Biome:
verdant_hollow(249 chunks)
C-3. 10 worlds initialized concurrently (after batching)
| worldId | status | Duration |
|---|---|---|
| 14 | COMPLETED | 37s |
| 15 | COMPLETED | 36s |
| 16 | COMPLETED | 36s |
| 17 | COMPLETED | 36s |
| 18 | COMPLETED | 29s |
| 19 | COMPLETED | 37s |
| 20 | COMPLETED | 37s |
| 21 | COMPLETED | 36s |
| 22 | COMPLETED | 29s |
| 23 | COMPLETED | 36s |
Average: ~35s
6) Before/After Comparison
| Scenario | Before | After |
|---|---|---|
| Single world | 36s | (not re-measured) |
| 5 concurrent worlds | 48–187s (cumulative delay) | (not re-measured) |
| 10 concurrent worlds | (not measured) | 29–37s (stable) |
7) Conclusions
- Applying batch messages eliminates the “cumulative delay” pattern under concurrency (in this environment and biome).
- With batching enabled, 10 concurrent inits remain close to single-world baseline time (~35s).
- Message count reduction is substantial: 249 → 3 (≈ 83× fewer messages), which aligns with the observed stabilization.
8) Next Steps
- Re-run the 5 concurrent worlds benchmark after batching for an apples-to-apples before/after comparison.
- Add repeated runs and report median / p95 durations instead of single measurements.
- Validate limits and safety:
- max batch size vs message size / gas limit
- failure handling for partial batches
- Measure on a production-like environment (network latency, node load) to confirm the improvement holds outside localhost.