Simulating a probability of 1 of 2^N with less than N random bitsIs the number of coin tosses of a probabilistic Turing machine a Blum complexity measure?Prove that inserting $n$ sorted values in to an AVL using AVL insertion is $Thetaleft (n log left ( n right ) right )$2SUM with a weightIs transitivity required for a sorting algorithmHow to compare conditional entropy and mutual information?How to state a recurrence that expresses the worst case for good pivots?Counting words that satisfy SAT-like constraints with BDDsHigher order empirical entropy is not the entropy of the empirical distribution?How do these alternative definitions of one-way functions compare?Average-case analysis of linear search given that the desired element appears $k$ times

Transformation of random variables and joint distributions

Why has "pence" been used in this sentence, not "pences"?

A social experiment. What is the worst that can happen?

Can the Supreme Court overturn an impeachment?

How do ground effect vehicles perform turns?

Folder comparison

Will adding a BY-SA image to a blog post make the entire post BY-SA?

Should I install hardwood flooring or cabinets first?

Why did the EU agree to delay the Brexit deadline?

Does having a TSA Pre-Check member in your flight reservation increase the chances that everyone gets Pre-Check?

Can someone explain how this makes sense electrically?

What does the Rambam mean when he says that the planets have souls?

If a character with the Alert feat rolls a crit fail on their Perception check, are they surprised?

Varistor? Purpose and principle

Why does the integral domain "being trapped between a finite field extension" implies that it is a field?

Divine apple island

THT: What is a squared annular “ring”?

Difference between -| and |- in TikZ

How to color a curve

Can a significant change in incentives void an employment contract?

Have I saved too much for retirement so far?

Proving a function is onto where f(x)=|x|.

How do you respond to a colleague from another team when they're wrongly expecting that you'll help them?

Structured binding on const



Simulating a probability of 1 of 2^N with less than N random bits


Is the number of coin tosses of a probabilistic Turing machine a Blum complexity measure?Prove that inserting $n$ sorted values in to an AVL using AVL insertion is $Thetaleft (n log left ( n right ) right )$2SUM with a weightIs transitivity required for a sorting algorithmHow to compare conditional entropy and mutual information?How to state a recurrence that expresses the worst case for good pivots?Counting words that satisfy SAT-like constraints with BDDsHigher order empirical entropy is not the entropy of the empirical distribution?How do these alternative definitions of one-way functions compare?Average-case analysis of linear search given that the desired element appears $k$ times













2












$begingroup$


Say I need to simulate the following discrete distribution:



$$
P(X = k) =
begincases
frac12^N, & textif $k = 1$ \
1 - frac12^N, & textif $k = 0$
endcases
$$



The most obvious way is to draw $N$ random bits and check if all of them equals to 0 (or 1). However, information theory says



$$
beginalign
S & = - Sigma_i P_i logP_i \
& = - frac12^N logfrac12^N - left(1 - frac12^Nright) logleft(1 - frac12^Nright) \
& = frac12^N log2^N + left(1 - frac12^Nright) logfrac2^N2^N - 1 \
& rightarrow 0
endalign
$$



So the minimum number of random bits required actually decreases as $N$ goes large. How is this possible?



Please assume that we are running on a computer where bits is your only source of randomness, so you can't just tose a biased coin.










share|cite|improve this question











$endgroup$
















    2












    $begingroup$


    Say I need to simulate the following discrete distribution:



    $$
    P(X = k) =
    begincases
    frac12^N, & textif $k = 1$ \
    1 - frac12^N, & textif $k = 0$
    endcases
    $$



    The most obvious way is to draw $N$ random bits and check if all of them equals to 0 (or 1). However, information theory says



    $$
    beginalign
    S & = - Sigma_i P_i logP_i \
    & = - frac12^N logfrac12^N - left(1 - frac12^Nright) logleft(1 - frac12^Nright) \
    & = frac12^N log2^N + left(1 - frac12^Nright) logfrac2^N2^N - 1 \
    & rightarrow 0
    endalign
    $$



    So the minimum number of random bits required actually decreases as $N$ goes large. How is this possible?



    Please assume that we are running on a computer where bits is your only source of randomness, so you can't just tose a biased coin.










    share|cite|improve this question











    $endgroup$














      2












      2








      2





      $begingroup$


      Say I need to simulate the following discrete distribution:



      $$
      P(X = k) =
      begincases
      frac12^N, & textif $k = 1$ \
      1 - frac12^N, & textif $k = 0$
      endcases
      $$



      The most obvious way is to draw $N$ random bits and check if all of them equals to 0 (or 1). However, information theory says



      $$
      beginalign
      S & = - Sigma_i P_i logP_i \
      & = - frac12^N logfrac12^N - left(1 - frac12^Nright) logleft(1 - frac12^Nright) \
      & = frac12^N log2^N + left(1 - frac12^Nright) logfrac2^N2^N - 1 \
      & rightarrow 0
      endalign
      $$



      So the minimum number of random bits required actually decreases as $N$ goes large. How is this possible?



      Please assume that we are running on a computer where bits is your only source of randomness, so you can't just tose a biased coin.










      share|cite|improve this question











      $endgroup$




      Say I need to simulate the following discrete distribution:



      $$
      P(X = k) =
      begincases
      frac12^N, & textif $k = 1$ \
      1 - frac12^N, & textif $k = 0$
      endcases
      $$



      The most obvious way is to draw $N$ random bits and check if all of them equals to 0 (or 1). However, information theory says



      $$
      beginalign
      S & = - Sigma_i P_i logP_i \
      & = - frac12^N logfrac12^N - left(1 - frac12^Nright) logleft(1 - frac12^Nright) \
      & = frac12^N log2^N + left(1 - frac12^Nright) logfrac2^N2^N - 1 \
      & rightarrow 0
      endalign
      $$



      So the minimum number of random bits required actually decreases as $N$ goes large. How is this possible?



      Please assume that we are running on a computer where bits is your only source of randomness, so you can't just tose a biased coin.







      algorithms information-theory randomness pseudo-random-generators entropy






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited 6 hours ago







      nalzok

















      asked 6 hours ago









      nalzoknalzok

      432414




      432414




















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.



          The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.



          With your sampling procedure, the maximum number of random bits needed per draw is $N$ bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with $p=1/2$) -- this is because there is a $1/2$ probability that you only need 1 bit (if the first bit turns out to be 1), a $1/4$ probability that you only need 2 bits (if the first two bits turn out to be 01), a $1/8$ probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.



          The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample $m$ iid draws from this distribution. Suppose we need $f(m)$ bits to sample $m$ draws; then the entropy is the limit of $f(m)/k$ as $m to infty$.



          The third thing to note is that, with this distribution, you can sample $m$ iid draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another simple (using 2 more random bits on average), and so on, until you've repeated this $m$ times. That would require about $2m$ random bits on average.



          But it turns out there's a way to sample from $m$ draws using fewer than $2m$ bits. It's hard to believe, but it's true!



          Let me give you the intuition. Suppose you wrote down the result of sampling $m$ draws, where $m$ is really large. Then the result could be specified as a $m$-bit string. This $m$-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about $m/2^N$ 1's (could be more or less than that, but if $m$ is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of $2^N$ (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire $m$-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about $N$ bits to represent the length of each gap; and there will be about $m/2^N$ gaps; so we'll need in total about $mN/2^N$ bits (could be a bit more, could be a bit less, but if $m$ is sufficiently large, it'll usually be close to that). That's a lot shorter than a $m$-bit string.



          And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with $p=1/2^N$, and that can be done with roughly $sim N$ random bits on average (not $2^N$). You'll need about $m/2^N$ iid draws from this geometric distribution, so you'll need in total roughly $sim Nm/2^N$ random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than $2m$ bits.



          So, we can sample $m$ iid draws from your distribution, using just $f(m) sim Nm/2^N$ random bits (roughly). Recall that the entropy is $lim_m to infty f(m)/m$. So this means that you should expect the entropy to be (roughly) $N/2^N$. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
            $endgroup$
            – nalzok
            52 mins ago











          • $begingroup$
            @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
            $endgroup$
            – D.W.
            25 mins ago










          • $begingroup$
            So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
            $endgroup$
            – nalzok
            16 mins ago







          • 1




            $begingroup$
            @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
            $endgroup$
            – D.W.
            15 mins ago











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "419"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106018%2fsimulating-a-probability-of-1-of-2n-with-less-than-n-random-bits%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.



          The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.



          With your sampling procedure, the maximum number of random bits needed per draw is $N$ bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with $p=1/2$) -- this is because there is a $1/2$ probability that you only need 1 bit (if the first bit turns out to be 1), a $1/4$ probability that you only need 2 bits (if the first two bits turn out to be 01), a $1/8$ probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.



          The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample $m$ iid draws from this distribution. Suppose we need $f(m)$ bits to sample $m$ draws; then the entropy is the limit of $f(m)/k$ as $m to infty$.



          The third thing to note is that, with this distribution, you can sample $m$ iid draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another simple (using 2 more random bits on average), and so on, until you've repeated this $m$ times. That would require about $2m$ random bits on average.



          But it turns out there's a way to sample from $m$ draws using fewer than $2m$ bits. It's hard to believe, but it's true!



          Let me give you the intuition. Suppose you wrote down the result of sampling $m$ draws, where $m$ is really large. Then the result could be specified as a $m$-bit string. This $m$-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about $m/2^N$ 1's (could be more or less than that, but if $m$ is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of $2^N$ (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire $m$-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about $N$ bits to represent the length of each gap; and there will be about $m/2^N$ gaps; so we'll need in total about $mN/2^N$ bits (could be a bit more, could be a bit less, but if $m$ is sufficiently large, it'll usually be close to that). That's a lot shorter than a $m$-bit string.



          And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with $p=1/2^N$, and that can be done with roughly $sim N$ random bits on average (not $2^N$). You'll need about $m/2^N$ iid draws from this geometric distribution, so you'll need in total roughly $sim Nm/2^N$ random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than $2m$ bits.



          So, we can sample $m$ iid draws from your distribution, using just $f(m) sim Nm/2^N$ random bits (roughly). Recall that the entropy is $lim_m to infty f(m)/m$. So this means that you should expect the entropy to be (roughly) $N/2^N$. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
            $endgroup$
            – nalzok
            52 mins ago











          • $begingroup$
            @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
            $endgroup$
            – D.W.
            25 mins ago










          • $begingroup$
            So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
            $endgroup$
            – nalzok
            16 mins ago







          • 1




            $begingroup$
            @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
            $endgroup$
            – D.W.
            15 mins ago
















          2












          $begingroup$

          Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.



          The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.



          With your sampling procedure, the maximum number of random bits needed per draw is $N$ bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with $p=1/2$) -- this is because there is a $1/2$ probability that you only need 1 bit (if the first bit turns out to be 1), a $1/4$ probability that you only need 2 bits (if the first two bits turn out to be 01), a $1/8$ probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.



          The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample $m$ iid draws from this distribution. Suppose we need $f(m)$ bits to sample $m$ draws; then the entropy is the limit of $f(m)/k$ as $m to infty$.



          The third thing to note is that, with this distribution, you can sample $m$ iid draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another simple (using 2 more random bits on average), and so on, until you've repeated this $m$ times. That would require about $2m$ random bits on average.



          But it turns out there's a way to sample from $m$ draws using fewer than $2m$ bits. It's hard to believe, but it's true!



          Let me give you the intuition. Suppose you wrote down the result of sampling $m$ draws, where $m$ is really large. Then the result could be specified as a $m$-bit string. This $m$-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about $m/2^N$ 1's (could be more or less than that, but if $m$ is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of $2^N$ (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire $m$-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about $N$ bits to represent the length of each gap; and there will be about $m/2^N$ gaps; so we'll need in total about $mN/2^N$ bits (could be a bit more, could be a bit less, but if $m$ is sufficiently large, it'll usually be close to that). That's a lot shorter than a $m$-bit string.



          And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with $p=1/2^N$, and that can be done with roughly $sim N$ random bits on average (not $2^N$). You'll need about $m/2^N$ iid draws from this geometric distribution, so you'll need in total roughly $sim Nm/2^N$ random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than $2m$ bits.



          So, we can sample $m$ iid draws from your distribution, using just $f(m) sim Nm/2^N$ random bits (roughly). Recall that the entropy is $lim_m to infty f(m)/m$. So this means that you should expect the entropy to be (roughly) $N/2^N$. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.






          share|cite|improve this answer









          $endgroup$












          • $begingroup$
            Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
            $endgroup$
            – nalzok
            52 mins ago











          • $begingroup$
            @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
            $endgroup$
            – D.W.
            25 mins ago










          • $begingroup$
            So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
            $endgroup$
            – nalzok
            16 mins ago







          • 1




            $begingroup$
            @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
            $endgroup$
            – D.W.
            15 mins ago














          2












          2








          2





          $begingroup$

          Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.



          The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.



          With your sampling procedure, the maximum number of random bits needed per draw is $N$ bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with $p=1/2$) -- this is because there is a $1/2$ probability that you only need 1 bit (if the first bit turns out to be 1), a $1/4$ probability that you only need 2 bits (if the first two bits turn out to be 01), a $1/8$ probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.



          The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample $m$ iid draws from this distribution. Suppose we need $f(m)$ bits to sample $m$ draws; then the entropy is the limit of $f(m)/k$ as $m to infty$.



          The third thing to note is that, with this distribution, you can sample $m$ iid draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another simple (using 2 more random bits on average), and so on, until you've repeated this $m$ times. That would require about $2m$ random bits on average.



          But it turns out there's a way to sample from $m$ draws using fewer than $2m$ bits. It's hard to believe, but it's true!



          Let me give you the intuition. Suppose you wrote down the result of sampling $m$ draws, where $m$ is really large. Then the result could be specified as a $m$-bit string. This $m$-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about $m/2^N$ 1's (could be more or less than that, but if $m$ is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of $2^N$ (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire $m$-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about $N$ bits to represent the length of each gap; and there will be about $m/2^N$ gaps; so we'll need in total about $mN/2^N$ bits (could be a bit more, could be a bit less, but if $m$ is sufficiently large, it'll usually be close to that). That's a lot shorter than a $m$-bit string.



          And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with $p=1/2^N$, and that can be done with roughly $sim N$ random bits on average (not $2^N$). You'll need about $m/2^N$ iid draws from this geometric distribution, so you'll need in total roughly $sim Nm/2^N$ random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than $2m$ bits.



          So, we can sample $m$ iid draws from your distribution, using just $f(m) sim Nm/2^N$ random bits (roughly). Recall that the entropy is $lim_m to infty f(m)/m$. So this means that you should expect the entropy to be (roughly) $N/2^N$. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.






          share|cite|improve this answer









          $endgroup$



          Wow, great question! Let me try to explain the resolution. It'll take three distinct steps.



          The first thing to note is that the entropy is focused more on the average number of bits needed per draw, not the maximum number of bits needed.



          With your sampling procedure, the maximum number of random bits needed per draw is $N$ bits, but the average number of bits needed is 2 bits (the average of a geometric distribution with $p=1/2$) -- this is because there is a $1/2$ probability that you only need 1 bit (if the first bit turns out to be 1), a $1/4$ probability that you only need 2 bits (if the first two bits turn out to be 01), a $1/8$ probability that you only need 3 bits (if the first three bits turn out to be 001), and so on.



          The second thing to note is that the entropy doesn't really capture the average number of bits needed for a single draw. Instead, the entropy captures the amortized number of bits needed to sample $m$ iid draws from this distribution. Suppose we need $f(m)$ bits to sample $m$ draws; then the entropy is the limit of $f(m)/k$ as $m to infty$.



          The third thing to note is that, with this distribution, you can sample $m$ iid draws with fewer bits than needed to repeatedly sample one draw. Suppose you naively decided to draw one sample (takes 2 random bits on average), then draw another simple (using 2 more random bits on average), and so on, until you've repeated this $m$ times. That would require about $2m$ random bits on average.



          But it turns out there's a way to sample from $m$ draws using fewer than $2m$ bits. It's hard to believe, but it's true!



          Let me give you the intuition. Suppose you wrote down the result of sampling $m$ draws, where $m$ is really large. Then the result could be specified as a $m$-bit string. This $m$-bit string will be mostly 0's, with a few 1's in it: in particular, on average it will have about $m/2^N$ 1's (could be more or less than that, but if $m$ is sufficiently large, usually the number will be close to that). The length of the gaps between the 1's are random, but will typically be somewhere vaguely in the vicinity of $2^N$ (could easily be half that or twice that or even more, but of that order of magnitude). Of course, instead of writing down the entire $m$-bit string, we could write it down more succinctly by writing down a list of the lengths of the gaps -- that carries all the same information, in a more compressed format. How much more succinct? Well, we'll usually need about $N$ bits to represent the length of each gap; and there will be about $m/2^N$ gaps; so we'll need in total about $mN/2^N$ bits (could be a bit more, could be a bit less, but if $m$ is sufficiently large, it'll usually be close to that). That's a lot shorter than a $m$-bit string.



          And if there's a way to write down the string this succinctly, perhaps it won't be too surprising if that means there's a way to generate the string with a number of random bits comparable to the length of the string. In particular, you randomly generate the length of each gap; this is sampling from a geometric distribution with $p=1/2^N$, and that can be done with roughly $sim N$ random bits on average (not $2^N$). You'll need about $m/2^N$ iid draws from this geometric distribution, so you'll need in total roughly $sim Nm/2^N$ random bits. (It could be a small constant factor larger, but not too much larger.) And, notice is that this is much smaller than $2m$ bits.



          So, we can sample $m$ iid draws from your distribution, using just $f(m) sim Nm/2^N$ random bits (roughly). Recall that the entropy is $lim_m to infty f(m)/m$. So this means that you should expect the entropy to be (roughly) $N/2^N$. That's off by a little bit, because the above calculation was sketchy and crude -- but hopefully it gives you some intuition for why the entropy is what it is, and why everything is consistent and reasonable.







          share|cite|improve this answer












          share|cite|improve this answer



          share|cite|improve this answer










          answered 2 hours ago









          D.W.D.W.

          102k12127291




          102k12127291











          • $begingroup$
            Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
            $endgroup$
            – nalzok
            52 mins ago











          • $begingroup$
            @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
            $endgroup$
            – D.W.
            25 mins ago










          • $begingroup$
            So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
            $endgroup$
            – nalzok
            16 mins ago







          • 1




            $begingroup$
            @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
            $endgroup$
            – D.W.
            15 mins ago

















          • $begingroup$
            Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
            $endgroup$
            – nalzok
            52 mins ago











          • $begingroup$
            @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
            $endgroup$
            – D.W.
            25 mins ago










          • $begingroup$
            So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
            $endgroup$
            – nalzok
            16 mins ago







          • 1




            $begingroup$
            @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
            $endgroup$
            – D.W.
            15 mins ago
















          $begingroup$
          Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
          $endgroup$
          – nalzok
          52 mins ago





          $begingroup$
          Wow, great answer! But could you elaborate on why sampling from a geometric distribution with $p=frac12^N$ takes $N$ bits on average? I know such a random variable would have a mean of $2^N$ , so it takes on average $N$ bits to store, but I suppose this doesn't mean you can generate one with $N$ bits.
          $endgroup$
          – nalzok
          52 mins ago













          $begingroup$
          @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
          $endgroup$
          – D.W.
          25 mins ago




          $begingroup$
          @nalzok, A fair question! Could you perhaps ask that as a separate question? I can see how to do it, but it's a bit messy to type up right now. If you ask perhaps someone will get to answering quicker than I can. The approach I'm thinking of is similar to arithmetic coding. Define $q_i = Pr[Xle i]$ (where $X$ is the geometric r.v.), then generate a random number $r$ in the interval $[0,1)$, and find $i$ such that $q_i le r < q_i+1$. If you write down the bits of the binary expension $r$ one at a time, usually after writing down $N+O(1)$ bits of $r$, $i$ will be fully determined.
          $endgroup$
          – D.W.
          25 mins ago












          $begingroup$
          So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
          $endgroup$
          – nalzok
          16 mins ago





          $begingroup$
          So you're basically using the inverse CDF method to convert a uniformly distributed random variable to an arbitrary distribution, combined with an idea similar to binary search? I'll need to analyze the quantile function of a geometric distribution to be sure, but this hint is enough. Thanks!
          $endgroup$
          – nalzok
          16 mins ago





          1




          1




          $begingroup$
          @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
          $endgroup$
          – D.W.
          15 mins ago





          $begingroup$
          @nalzok, ahh, yes, that's a nicer way to think about it -- lovely. Thank you for suggesting that. Yup, that's what I had in mind.
          $endgroup$
          – D.W.
          15 mins ago


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Computer Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f106018%2fsimulating-a-probability-of-1-of-2n-with-less-than-n-random-bits%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Era Viking Índice Início da Era Viquingue | Cotidiano | Sociedade | Língua | Religião | A arte | As primeiras cidades | As viagens dos viquingues | Viquingues do Oeste e Leste | Fim da Era Viquingue | Fontes históricas | Referências Bibliografia | Ligações externas | Menu de navegação«Sverige då!»«Handel I vikingetid»«O que é Nórdico Antigo»Mito, magia e religião na volsunga saga Um olhar sobre a trajetória mítica do herói sigurd«Bonden var den verklige vikingen»«Vikingatiden»«Vikingatiden»«Vinland»«Guerreiras de Óðinn: As Valkyrjor na Mitologia Viking»1519-9053«Esculpindo símbolos e seres: A arte viking em pedras rúnicas»1679-9313Historia - Tema: VikingarnaAventura e Magia no Mundo das Sagas IslandesasEra Vikinge

          What's the metal clinking sound at the end of credits in Avengers: Endgame?What makes Thanos so strong in Avengers: Endgame?Who is the character that appears at the end of Endgame?What happens to Mjolnir (Thor's hammer) at the end of Endgame?The People's Ages in Avengers: EndgameWhat did Nebula do in Avengers: Endgame?Messing with time in the Avengers: Endgame climaxAvengers: Endgame timelineWhat are the time-travel rules in Avengers Endgame?Why use this song in Avengers: Endgame Opening Logo Sequence?Peggy's age in Avengers Endgame

          Mortes em março de 2019 Referências Menu de navegação«Zhores Alferov, Nobel de Física bielorrusso, morre aos 88 anos - Ciência»«Fallece Rafael Torija, o bispo emérito de Ciudad Real»«Peter Hurford dies at 88»«Keith Flint, vocalista do The Prodigy, morre aos 49 anos»«Luke Perry, ator de 'Barrados no baile' e 'Riverdale', morre aos 52 anos»«Former Rangers and Scotland captain Eric Caldow dies, aged 84»«Morreu, aos 61 anos, a antiga lenda do wrestling King Kong Bundy»«Fallece el actor y director teatral Abraham Stavans»«In Memoriam Guillaume Faye»«Sidney Sheinberg, a Force Behind Universal and Spielberg, Is Dead at 84»«Carmine Persico, Colombo Crime Family Boss, Is Dead at 85»«Dirigent Michael Gielen gestorben»«Ciclista tricampeã mundial e prata na Rio 2016 é encontrada morta em casa aos 23 anos»«Pagan Community Notes: Raven Grimassi dies, Indianapolis pop-up event cancelled, Circle Sanctuary announces new podcast, and more!»«Hal Blaine, Wrecking Crew Drummer, Dies at 90»«Morre Coutinho, que editou dupla lendária com Pelé no Santos»«Cantor Demétrius, ídolo da Jovem Guarda, morre em SP»«Ex-presidente do Vasco, Eurico Miranda morre no Rio de Janeiro»«Bronze no Mundial de basquete de 1971, Laís Elena morre aos 76 anos»«Diretor de Corridas da F1, Charlie Whiting morre aos 66 anos às vésperas do GP da Austrália»«Morreu o cardeal Danneels, da Bélgica»«Morreu o cartoonista Augusto Cid»«Morreu a atriz Maria Isabel de Lizandra, de "Vale Tudo" e novelas da Tupi»«WS Merwin, prize-winning poet of nature, dies at 91»«Atriz Márcia Real morre em São Paulo aos 88 anos»«Mauritanie: décès de l'ancien président Mohamed Mahmoud ould Louly»«Morreu Dick Dale, o rei da surf guitar e de "Pulp Fiction"»«Falleció Víctor Genes»«João Carlos Marinho, autor de 'O Gênio do Crime', morre em SP»«Legendary Horror Director and SFX Artist John Carl Buechler Dies at 66»«Morre em Salvador a religiosa Makota Valdina»«مرگ بازیکن‌ سابق نساجی بر اثر سقوط سنگ در مازندران»«Domingos Oliveira morre no Rio»«Morre Airton Ravagniani, ex-São Paulo, Fla, Vasco, Grêmio e Sport - Notícias»«Morre o escritor Flavio Moreira da Costa»«Larry Cohen, Writer-Director of 'It's Alive' and 'Hell Up in Harlem,' Dies at 77»«Scott Walker, experimental singer-songwriter, dead at 76»«Joseph Pilato, Day of the Dead Star and Horror Favorite, Dies at 70»«Sheffield United set to pay tribute to legendary goalkeeper Ted Burgin who has died at 91»«Morre Rafael Henzel, sobrevivente de acidente aéreo da Chapecoense»«Morre Valery Bykovsky, um dos primeiros cosmonautas da União Soviética»«Agnès Varda, cineasta da Nouvelle Vague, morre aos 90 anos»«Agnès Varda, cineasta francesa, morre aos 90 anos»«Tania Mallet, James Bond Actress and Helen Mirren's Cousin, Dies at 77»e