In search of the least viewed article on Wikipedia

ray@lemmy.ml · 8 months ago

In search of the least viewed article on Wikipedia

bool@lemm.ee · 8 months ago

Really enjoyed the read. Thanks for sharing. I’m surprised by the random page implementation.

Usually in a database each record has an integer primary key. The keys would be assigned sequentially as pages are created. Then the “random page” function could select a random integer between zero and the largest page index. If that index isn’t used (because the page was deleted), you could either try again with a new random number or then march up to the next non empty index.

AbouBenAdhem@lemmy.world · 8 months ago

Marching up to the next non-empty key would skew the distribution—pages preceded by more empty keys would show up more often under “random”.

SheeEttin@lemmy.world · edit-2 8 months ago

Fun fact, that concept is used in computer security exploits: https://en.wikipedia.org/wiki/NOP_slide

For choosing an article, it would be better to just pick a new random number.

Although there are probably more efficient ways to pick a random record out of a database. For example, by periodically reindexing, or by sorting extant records by random (if supported by the database).