The document says that the first level table (level 0) is omitted if the VA is restricted to 39 bits. My case sets T0SZ to 32 (0x20) and apparently they mean the first level is omitted if the VA is <= 39, as it clearly is omitted in my case.
With the H5 chip I will only be working with 32 bit physical addresses.
VA with 4K granule --
_______________________________________________
| | | | | | |
| 0 | Lv0 | Lv1 | Lv2 | Lv3 | off |
|_______|_______|_______|_______|_______|_______|
63-48 47-39 38-30 29-21 20-12 11-00
VA with 16K granule, note that only a single bit exists for level 0, which is thus a table with 2 entries.
_______________________________________________
| | | | | | |
| 0 | Lv0 | Lv1 | Lv2 | Lv3 | off |
|_______|_______|_______|_______|_______|_______|
63-48 47 46-36 35-25 24-14 13-00
VA with 64K granule, here we have only 6 bits for level 1, which is thus a table with 64 entries.
_______________________________________________
| | | | | |
| 0 | Lv1 | Lv2 | Lv3 | off |
|_______ _______|_______|_______|_______|_______|
63-48 47-42 41-29 28-16 15-00
In all cases upper levels may be eliminated by the choice of T0SZ.
Consider the general case with a 4K granule and a 48 bit VA.
The level 0 table has 512 entries,
Each L0 table entry controls 512G and points to a L1 table.
Each L1 table entry points to either a 1G block or an L2 table.
Each L2 table entry points to either a 2M block or an L3 table.
Each L3 table entry points to a 4K page.
Now consider my H5 set up with T0SZ = 32 and a 4K granule.
The level 0 table would be addressed by bits 47:39 and is out of the game.
The full level 1 table with 512 entries is used,
even though 32 bits will only be able to address the first 4 entries.
A level 0 entry can only be a descriptor giving the address of a level 1 table.
A level 3 entry can only be a block descriptor, the tree goes no deeper.
Block entries have upper and lower attributes.
Table descriptors have only upper attributes.
Upper attributes have only 2 bits of interest UXN and PXN. These are unprivileged and privileged eXecute never. I just ignore these bits and leave them zero.
Lower attributes are:
This looks to me like they ran out of space in the 64 bit PTE and used a 3 bit index to point to an 8 bit field.
/*
* Memory types
*/
#define MT_DEVICE_NGNRNE 0
#define MT_DEVICE_NGNRE 1
#define MT_DEVICE_GRE 2
#define MT_NORMAL_NC 3
#define MT_NORMAL 4
#define MEMORY_ATTRIBUTES ((0x00 << (MT_DEVICE_NGNRNE * 8)) | \
(0x04 << (MT_DEVICE_NGNRE * 8)) | \
(0x0c << (MT_DEVICE_GRE * 8)) | \
(0x44 << (MT_NORMAL_NC * 8)) | \
(UL(0xff) << (MT_NORMAL * 8)))
So, apparently only 5 of the 8 possible "memory types" are used.
In my case, with the H5 chip a table with only 2 block entries (each of 1G) does the job. The first entry maps the 1G address area that starts at 0 and holds IO registers. The second entry maps the 1G address area that starts at 0x40000000 and holds DRAM.
I dump these two entries as hex and see:
PTE-7fff0000 0000000000000401 PTE-7fff0008 0000000040000711I use the fancy pretty-print routines I found in U-boot to dump these and I see:
[0x00000000000000 - 0x00000040000000] | Block | RWX | Device-nGnRnE | Non-shareable [0x00000040000000 - 0x00000080000000] | Block | RWX | Normal | Inner-shareableThere are really only 2 differences bwtween the two descriptors:
The whole business of Inner/Outer shareability is murky and not explained clearly anywhere. At least not anywhere that I have found yet, and I have looked pretty hard. Setting SH to 00 ( non-shareable ) for IO registers certainly makes sense.
It is not clear what inner shareable might mean. The "big idea" involves the fact that the Cortex-A53 is a 4 core cluster and all 4 cores share the L2 cache. What happens when one core writes to memory? The D cache for that core would get updated (but perhaps not yet memory). Other cores might have values for that memory location cached that now need to be invalidated. The cluster has a "snoop unit" that handles this. The shareability bits indicate (among other things perhaps) whether the snoop unit should get involved.
Note that the translation tables themselves can be placed in cacheable memory. This will speed up address translation, and there are fields in the TCR that control cacheability.
Tom's electronics pages / tom@mmto.org