Skip to main content
💻🏗️📝📌communitySource: buttondown.comeveryone

Floyd's Sampling Algorithm • Buttondown

I love sampling algorithms. Here's the sampling algorithm that I find most magical. We want to generate a subset of {1, 2, ..., n} of size k. def floyd(n,...

4 min read665 words

April 6, 2026

Floyd's Sampling Algorithm #

I love sampling algorithms. Here's the sampling algorithm that I find most magical. We want to generate a subset of {1, 2, ..., n} of size k.

text
def floyd(n, k):
    s = set()
    for i in range(n - k + 1, n + 1):
        t = random.randint(1, i)
        if t in s:
            s.add(i)
        else:
            s.add(t)
    return s

I learned about this algorithm the canonical way all good algorithm lore is imparted; one of my old coworkers wrote this comment ↗ in some code I was looking at once:

text
// Use Robert Floyd's algorithm to generate numSplits distinct integers
// between 0 and len(splitKeys), just because it's so cool!
chosen := make(map[int]struct{})
for j := len(splitKeys) - numSplits; j < len(splitKeys); j++ {
    t := fit.rng.Intn(j + 1)
    if , alreadyChosen := chosen[t]; !alreadyChosen {
        // Insert T.
        chosen[t] = struct{}{}
    } else {
        // Insert J.
        chosen[j] = struct{}{}
    }
}

Unlike, say, Reservoir Sampling or something, this algorithm sort of resists an easy intuition to me. The branch makes it weird and makes a normal combinatorial, or visual argument tricky. In fact, I find even a casual intuition hard to grasp. Here are two possible ways to think about what makes it work.

First Perspective #

Here's one way to think about it: each iteration of the algorithm goes from a uniformly distributed k-subset of [1..n] and an element of [1..n+1] to a uniformly distributed k+1-subset of [1..n+1]. If we can show that the map from those k-subsets and that sample is k+1-to-1, then we've shown that the resulting set is also uniformly distributed.

So, we need to show that each possible k+1-subset S of [1..n+1] has exactly k+1 inputs that map to it:

  • sets containing n+1 come from the k-subset without n+1, and any of the elements of S, and
  • sets not containing n+1 come from starting with one of its k-subsets and adding the missing element.
And that's it; the result is uniformly distributed.

This is not so bad, I think it basically makes sense to me. But I also learned of another way of thinking about this algorithm while trying to write this post.

Second Perspective #

This one comes from Paul ↗, and it notes a connection between this algorithm and the Fisher-Yates shuffle for permuting an array uniformly at random. Specifically, the "upward" variation that looks like this:

text
def shuffle(a):
    for i in range(0, len(a)):
        j = random.randint(0, i)
        a[i], a[j] = a[j], a[i]

This builds up a permutation incrementally. After each step, we have a permutation on [0..i], then we augment that and turn it into a permutation on [0..i+1].

Imagine plucking off the last k elements of an array we just shuffled, then that's a uniformly random sample of k elements. The observation here is that Floyd's algorithm is just doing the last k swaps here, directly.

If you call the last k elements of the array the "tail," then for each element i in the tail originally, either we:

  • swap i with j that's in the tail, in which case i will stay in the tail forever and be part of the sample, or
  • swap it with j not in the tail, then j will live in the tail forever and be part of the sample.
These two cases correspond directly to the two branches of Floyd's algorithm and we wind up with the same sample following them both (up to some reordering).

Don't miss what's next. Subscribe to NULL BITMAP by Justin Jaffray:

Email address (required) Subscribe

#### Add a comment:

Comment and Subscribe

Source Attribution
OpenDocs keeps source fields explicit. Unknown values are labeled instead of invented.

Source domain

buttondown.com

Publisher

buttondown.com

License / usage

Unknown. Review the original source terms before republishing beyond public-safe excerpts.

Score
Version docs-phase3-2026-05-20
91

Overall quality score, confidence 81%

Source credibility90
Freshness96
Completeness98
Extraction quality85
Attribution confidence90
Readability
standard | grade 10.9 | format 78

21 sentences, 3 headings, 4 list items.

Keep source attribution visible in the rendered document.

Search Appearance
canonical document page
OpenDocs keeps inspected URLs, canonical URLs, snippets, and rich-result signals explicit; Search Console metrics are not treated as visits.
Social Card
Floyd&#x27;s Sampling Algorithm • Buttondown
I love sampling algorithms. Here&#x27;s the sampling algorithm that I find most magical. We want to generate a subset of {1, 2, ..., n} of size k. def floyd(n,...
Duplicate State
No duplicate is asserted on this page without a matching canonical URL or content hash cluster.
Trace
docs-score-ba3d322fac9a0735975e2e9b
Export
Use public export endpoints for Markdown/JSON. Protected publishing still requires PLATPHORM_API_KEY.

Related Documentation

👥

Chert | iMessage Infrastructure for Reaching People at Scale

Skip to main content https://docs.platphormnews.com/docs/chert imessage infrastructure for reaching people at scale main content Back to docs https://docs.platphormnews.com/docs Skip to content ↗ https://www.trychert.com

7 min read

👥

SEO Starter Guide: The Basics | Google Search Central | Documentation | Google for Developers

Skip to main content https://developers.google.com/search/docs/fundamentals/seo starter guide main content Google Search Central English Deutsch Español Español – América Latina Français Indonesia Italiano Polski Portugu

22 min read

👥

Chert | iMessage Infrastructure for Reaching People at Scale

Skip to content https://www.trychert.com/ main content New Chert is now live on Hacker News Check it out → https://www.trychert.com/agent Chert https://www.trychert.com/ Home https://www.trychert.com/ Blog https://www.tr

5 min read

🚀🧪📖🔍❌

Three Inverse Laws of AI - Susam Pal

9 min read

🔒🏗️📄✨🔄

GameStop Proposes to Acquire eBay at $125.00 Per Share | GameStop Corp.

GameStop Corp. (NYSE: GME) today submitted a non-binding proposal to acquire 100% of eBay Inc. (NASDAQ: EBAY) at $125.00 per share in cash and stock. The offer represents a 46% premium to eBay’s unaffected closing price on February 4, 2026, the day GameStop started accumulating its position in eBay. GameStop has built a 5% economic stake in eBay through derivatives and beneficial ownership of common stock. GameStop is filing a Schedule 13D and HSR notification tomorrow. The full proposal letter and accompanying materials are available at investor.gamestop.com/ebay . The proposed offer is $125.00 per share, comprising 50% cash and 50% GameStop common stock, with full shareholder election rights as to consideration type and pro-rata allocation. Aggregate undiluted equity value is approximately $55.5 billion, based on eBay’s most recently disclosed undiluted share count, representing a 27% premium to the 30-day VWAP and a 36% premium to the 90-day VWAP. The transaction is conditioned on

11 min read