World Creation Performance Research (Personal Worlds)
TL;DR
This document investigates personal world initialization time under different concurrency scenarios and evaluates whether protocol/message-level optimizations are worthwhile.
Key takeaways:
- Baseline: 1 world init completes in ~36s on local host.
- Baseline concurrency issue: 5 worlds initialized “at the same time” show cumulative delays (48s → 187s).
- Optimization: batching (
SetChunkVerifiers) dramatically reduces on-chain message count and gas overhead. - After batching: 10 concurrent world inits complete consistently in ~29–37s (avg ~35s).
1) Background and Goal
In Akkadia, “world initialization” is an end-to-end workflow triggered after CreateWorld.
It includes off-chain processing (biome/chunk pipeline) and on-chain state writes (chunk verifier registration).
The on-chain bottleneck candidate was verifier writes.
Before optimization, writes were effectively many message-level calls (SetChunkVerifier style).
After optimization, writes were grouped into batched calls (SetChunkVerifiers), with BATCH_SIZE=100 in bridge message construction.
// contracts/personal_world/data.gno
var (
verifierStore = avl.NewTree() // worldID(string) -> map[chunkKey]verifier
)
func SetChunkVerifier(cur realm, worldID uint32, chunkKey string, verifier string) {
caller := runtime.PreviousRealm().Address()
assertIsAdminOrOperator(caller)
worldIDStr := ufmt.Sprintf("%d", worldID)
var chunkMap map[string]string // chunkKey -> verifier
if value, exists := verifierStore.Get(worldIDStr); exists {
chunkMap = value.(map[string]string)
} else {
chunkMap = make(map[string]string)
verifierStore.Set(worldIDStr, chunkMap)
}
chunkMap[chunkKey] = verifier
}
func SetChunkVerifiers(cur realm, worldID uint32, chunkKeys string, verifiers string) {
caller := runtime.PreviousRealm().Address()
assertIsAdminOrOperator(caller)
if chunkKeys == "" { panic("chunkKeys must not be empty") }
if verifiers == "" { panic("verifiers must not be empty") }
// direct scan parsing (no strings.Split)
keyStart, valStart := 0, 0
keyIdx, valIdx := 0, 0
for {
for keyIdx < len(chunkKeys) && chunkKeys[keyIdx] != ',' { keyIdx++ }
for valIdx < len(verifiers) && verifiers[valIdx] != ',' { valIdx++ }
key := chunkKeys[keyStart:keyIdx]
val := verifiers[valStart:valIdx]
if key == "" { panic("empty chunkKey not allowed") }
if val == "" { panic("empty verifier not allowed") }
chunkMap[key] = val
keyEnd := keyIdx >= len(chunkKeys)
valEnd := valIdx >= len(verifiers)
if keyEnd != valEnd { panic("chunkKeys and verifiers count mismatch") }
if keyEnd { break }
keyIdx++
valIdx++
keyStart = keyIdx
valStart = valIdx
}
}This report focuses on how that message strategy changes completion latency under concurrent initialization requests.
Goal
- Measure initialization duration for:
- 1 world (baseline)
- 5 worlds in parallel (baseline concurrency behavior)
- Identify an optimization that reduces bottlenecks (message count / gas / parsing overhead).
- Re-run concurrency benchmarks after applying the optimization.
2) Test Environment (Baseline Benchmarks)
- Date: 2026-02-05
- Execution: host (local machine)
- Gno RPC:
http://localhost:26657 - Bridge:
http://localhost:3020
Metric definition
- “Duration” = (completedAt - createdAt) until the world reaches
COMPLETED.
3) Experiment A — Baseline Initialization Benchmarks
A-1. Single world initialization
| worldId | status | createdAt | completedAt | Duration |
|---|---|---|---|---|
| 8 | COMPLETED | 08:40:35.245 | 08:41:11.643 | 36s |
A-2. 5 worlds initialized concurrently
Procedure:
- Create 5 worlds individually.
- Send the init request for all 5 worlds at the same time.
| worldId | status | createdAt | completedAt | Duration |
|---|---|---|---|---|
| 12 | COMPLETED | 08:44:50.689 | 08:45:39.563 | 48s |
| 11 | COMPLETED | 08:44:50.733 | 08:46:18.696 | 87s |
| 10 | COMPLETED | 08:44:50.711 | 08:46:51.834 | 121s |
| 9 | COMPLETED | 08:44:50.768 | 08:47:24.948 | 154s |
| 13 | COMPLETED | 08:44:50.797 | 08:47:58.130 | 187s |
Observation
- Even though init requests are sent concurrently, completion times drift significantly.
- This suggests a bottleneck (or partial serialization) in message processing, RPC throughput, bridge handling, or on-chain execution costs.
4) Experiment B — “Parameter Readline” / Parsing & Gas Research
B-1. Hypothesis
If we avoid strings.Split(str, ",") and instead scan the string directly (handling delimiters as we encounter them), we may reduce gas usage by avoiding extra slice allocations and intermediate strings.
B-2. Implementation idea
- Implement
SetChunkVerifiersusing a single-pass scan parser (nostrings.Split). - Prefer batch processing over repeated individual calls.
B-3. Measurements
SetChunkVerifiers (batch, single-pass scan parsing)
| Items | Keys length (chars) | Vals length (chars) | Gas used | Gas / item | Storage delta |
|---|---|---|---|---|---|
| 3 | 11 | 194 | 5,512,197 | 1,837,399 | 1,905 bytes |
| 10 | 39 | 649 | 6,912,890 | 691,289 | 3,263 bytes |
| 25 | 114 | 1,624 | 9,956,000 | 398,240 | 6,188 bytes |
| 50 | 239 | 3,249 | 15,027,860 | 300,557 | 11,063 bytes |
| 100 | 489 | 6,499 | 25,171,592 | 251,716 | 20,815 bytes |
SetChunkVerifier (single item per call)
| Items | Gas used | Gas / item |
|---|---|---|
| 1 | 4,914,970 | 4,914,970 |
B-4. Comparative analysis
For 100 items:
| Approach | Total gas | Relative |
|---|---|---|
Batch (SetChunkVerifiers) | 25,171,592 | 1.0x |
100 individual calls (SetChunkVerifier × 100) | 491,497,000 (estimated) | 19.5x |
Result: batching reduces total gas by ~95% (for 100 items), mainly by removing per-message overhead.
B-5. Gas growth per additional item (batch mode)
| Range | Added items | Incremental gas per item |
|---|---|---|
| 3 → 10 | 7 | 200,099 |
| 10 → 25 | 15 | 202,874 |
| 25 → 50 | 25 | 202,874 |
| 50 → 100 | 50 | 202,875 |
Conclusion: gas increases by approximately ~200,000 gas per item, showing a stable linear relationship.
B-6. Outcome
- Batching is the dominant win: ~95% gas reduction vs per-item calls.
- Single-pass scan parsing removes additional allocation overhead compared to
strings.Split. - Predictable scaling: per-item incremental gas is ~200k.
- Practical batch size: 50–100 items is efficient in terms of gas/item.
5) Experiment C — Benchmark After Applying Batch Messages
C-1. Change summary
- Replace individual
SetChunkVerifiermessages with batchSetChunkVerifiers. - Batch size: 100
- Chunks: 249 total → 3 batch messages (100 + 100 + 49)
C-2. Test environment
- Date: 2026-02-05
- Execution: host (local machine)
- Biome:
verdant_hollow(249 chunks)
C-3. 10 worlds initialized concurrently (after batching)
| worldId | status | Duration |
|---|---|---|
| 14 | COMPLETED | 37s |
| 15 | COMPLETED | 36s |
| 16 | COMPLETED | 36s |
| 17 | COMPLETED | 36s |
| 18 | COMPLETED | 29s |
| 19 | COMPLETED | 37s |
| 20 | COMPLETED | 37s |
| 21 | COMPLETED | 36s |
| 22 | COMPLETED | 29s |
| 23 | COMPLETED | 36s |
Average: ~35s
6) Before/After Comparison
| Scenario | Before | After |
|---|---|---|
| Single world | 36s | (not re-measured) |
| 5 concurrent worlds | 48–187s (cumulative delay) | (not re-measured) |
| 10 concurrent worlds | (not measured) | 29–37s (stable) |
7) Conclusions
- Applying batch messages eliminates the “cumulative delay” pattern under concurrency (in this environment and biome).
- With batching enabled, 10 concurrent inits remain close to single-world baseline time (~35s).
- Message count reduction is substantial: 249 → 3 (≈ 83× fewer messages), which aligns with the observed stabilization.
8) Next Steps
- Re-run the 5 concurrent worlds benchmark after batching for an apples-to-apples before/after comparison.
- Add repeated runs and report median / p95 durations instead of single measurements.
- Validate limits and safety:
- max batch size vs message size / gas limit
- failure handling for partial batches
- Measure on a production-like environment (network latency, node load) to confirm the improvement holds outside localhost.