Algorithm to convert a fixed-length string to the smallest possible collision-free representation?Smallest string length to contain all types of beadsFree description of the Kameda-Weiner algorithm?Wagner Fischer Algorithm and the Empty StringCan a Markov Algorithm Replace the Empty String?Get the nth lexicographic string of “all” possible combinations of an alphabetHow to efficiently represent any possible mutations to a string of a given length?How to merge all 3-grams into smallest possible string?Finding the smallest string that contains a given set of substringsAlgorithm to generate repetitive sequence of fixed lengthAlgorithm to separate circles to reduce collision the maximum between them

They call me Inspector Morse

How strictly should I take "Candidates must be local"?

Making a sword in the stone, in a medieval world without magic

Offered promotion but I'm leaving. Should I tell?

Subset counting for even numbers

Low budget alien movie about the Earth being cooked

Why would one plane in this picture not have gear down yet?

Best approach to update all entries in a list that is paginated?

Could you please stop shuffling the deck and play already?

A three room house but a three headED dog

Accountant/ lawyer will not return my call

Is there an equal sign with wider gap?

Word for a person who has no opinion about whether god exists

Finding algorithms of QGIS commands?

Things to avoid when using voltage regulators?

BitNot does not flip bits in the way I expected

Single word request: Harming the benefactor

Extra alignment tab has been changed to cr. } using table, tabular and resizebox

Why is this plane circling around the Lucknow airport every day?

Do I really need to have a scientific explanation for my premise?

My story is written in English, but is set in my home country. What language should I use for the dialogue?

Grey hair or white hair

How to pass a string to a command that expects a file?

PTIJ: How can I halachically kill a vampire?

Algorithm to convert a fixed-length string to the smallest possible collision-free representation?

Smallest string length to contain all types of beadsFree description of the Kameda-Weiner algorithm?Wagner Fischer Algorithm and the Empty StringCan a Markov Algorithm Replace the Empty String?Get the nth lexicographic string of “all” possible combinations of an alphabetHow to efficiently represent any possible mutations to a string of a given length?How to merge all 3-grams into smallest possible string?Finding the smallest string that contains a given set of substringsAlgorithm to generate repetitive sequence of fixed lengthAlgorithm to separate circles to reduce collision the maximum between them

I have a US-based telephone number (in the format 000-000-0000) and need to convert it to a "shorter" representation. Using base32/64 produces too long of a string. I've discovered CRC-16, which produces a string of length 4, however, I'm not sure it's a suitable tool for the job (I want to prevent collisions).

Basically, if the input is 000-000-0000 (without the dashes), I want the output to be something like 4xg4. In this use case, the advantage is that I have 10 digits, thus I could "convert" them to something smaller that uses letters as well.

Are you aware of any algorithm implementation for this?

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

1

$begingroup$
Check out Huffman coding.
$endgroup$
– Pål GD
10 hours ago

1

$begingroup$
The answers below are mathematically correct. With the requirement of "no collisions" and no further information which could reduce your possible inputs, those are the best you can do. But if we understood why you needed a shorter representation, we might be able to offer other solutions. Disk space or memory seem unlikely constraints if you have the CPU to compress/decompress on the fly this kind of data. So is it some sort of UI constraint where the display needs to be shorter? Or something else?
$endgroup$
– JesseM
3 hours ago

add a comment |

Are you aware of any algorithm implementation for this?

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

1

$begingroup$
Check out Huffman coding.
$endgroup$
– Pål GD
10 hours ago

1

$begingroup$
The answers below are mathematically correct. With the requirement of "no collisions" and no further information which could reduce your possible inputs, those are the best you can do. But if we understood why you needed a shorter representation, we might be able to offer other solutions. Disk space or memory seem unlikely constraints if you have the CPU to compress/decompress on the fly this kind of data. So is it some sort of UI constraint where the display needs to be shorter? Or something else?
$endgroup$
– JesseM
3 hours ago

add a comment |

Are you aware of any algorithm implementation for this?

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

Are you aware of any algorithm implementation for this?

algorithms data-compression base-conversion

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

edited 11 hours ago

greybeard

4131314

edited 11 hours ago

greybeard

4131314

edited 11 hours ago

greybeard

4131314

asked 13 hours ago

anemaria20

1061

New contributor

asked 13 hours ago

anemaria20

1061

asked 13 hours ago

anemaria20

1061

New contributor

anemaria20 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

$begingroup$
Check out Huffman coding.
$endgroup$
– Pål GD
10 hours ago

1

$begingroup$
The answers below are mathematically correct. With the requirement of "no collisions" and no further information which could reduce your possible inputs, those are the best you can do. But if we understood why you needed a shorter representation, we might be able to offer other solutions. Disk space or memory seem unlikely constraints if you have the CPU to compress/decompress on the fly this kind of data. So is it some sort of UI constraint where the display needs to be shorter? Or something else?
$endgroup$
– JesseM
3 hours ago

add a comment |

1

$begingroup$
Check out Huffman coding.
$endgroup$
– Pål GD
10 hours ago

1

$begingroup$
The answers below are mathematically correct. With the requirement of "no collisions" and no further information which could reduce your possible inputs, those are the best you can do. But if we understood why you needed a shorter representation, we might be able to offer other solutions. Disk space or memory seem unlikely constraints if you have the CPU to compress/decompress on the fly this kind of data. So is it some sort of UI constraint where the display needs to be shorter? Or something else?
$endgroup$
– JesseM
3 hours ago

Check out Huffman coding.

– Pål GD
10 hours ago

The answers below are mathematically correct. With the requirement of "no collisions" and no further information which could reduce your possible inputs, those are the best you can do. But if we understood why you needed a shorter representation, we might be able to offer other solutions. Disk space or memory seem unlikely constraints if you have the CPU to compress/decompress on the fly this kind of data. So is it some sort of UI constraint where the display needs to be shorter? Or something else?

– JesseM
3 hours ago

add a comment |

2 Answers
2

active

oldest

votes

You have 10 base10 digits. This is $10^10$ possible values.

Encoding this in an alphabet with 64 tokens you need $lceil log_64(10^10) rceil = 6$ characters.

If you encode it in straight binary you only need $lceil log_2(10^10) rceil = 34$ bits total. 4 bytes + 2 bits.

Unless there is a pattern in the number itself this is the best you can do.

Note that this looks at the number in a vacuum. If you encode more than just that number you can use arithmetic encoding to use some of the "spare room" the rounding creates.

answered 13 hours ago

ratchet freak

2,86388

add a comment |

As ratchet freak says, you have ten decimal digits, which should give $10^10$ possible values. But in practice, there are a few more restrictions. The format of a North American telephone number looks something like this:

[2-9][0-8]d - [2-9]dd - dddd

This gives 8*9*10 * 8*10*10 * 10*10*10*10 values. In addition, the fifth and sixth characters can't both be 1, which removes 1% of these. This means there are 5,702,400,000 possible telephone numbers. (Let's call this number $T$.)

If you want to encode these using an alphabet with $b$ different characters, you'll need a code $lceil log_b T rceil$ characters long. So if you want a code four characters long, you'll need $b geq 275$. If you want a code five characters long, you'll need $b geq 90$. And if you want a code six characters long, you'll need $b geq 43$.

Unfortunately, this is the best you can do, unless you have additional information about what numbers you'll be getting. This comes down to the pigeonhole principle, a remarkably straightforward theorem that's quite useful in CS. It's mathematically impossible to use fewer characters than this without ever having any collisions. You can make collisions extremely unlikely, but never fully prevent them.

(You can also make some of the codes be shorter without collisions, but as a consequence some of them will have to be longer. Again, pigeonhole principle.)

P.S. Note that, while base64 also produces a six-character string (since 64 is between 43 and 90), you can take advantage of the fact that you only need 43 to choose a "nicer" character set. For example, you can get rid of o, O, 0, Q, I, l, and 1 to cut down on confusion in writing. My suggestion would be 23456789ABCDEFGHJKLMNPRSTVXYZabdeghknpqrt-=.

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

$begingroup$
how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.
$endgroup$
– ratchet freak
9 hours ago

$begingroup$
@ratchetfreak D'oh, of course, you're right. Let me adjust that.
$endgroup$
– Draconis
9 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "419"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

anemaria20 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f105488%2falgorithm-to-convert-a-fixed-length-string-to-the-smallest-possible-collision-fr%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You have 10 base10 digits. This is $10^10$ possible values.

Encoding this in an alphabet with 64 tokens you need $lceil log_64(10^10) rceil = 6$ characters.

If you encode it in straight binary you only need $lceil log_2(10^10) rceil = 34$ bits total. 4 bytes + 2 bits.

Unless there is a pattern in the number itself this is the best you can do.

Note that this looks at the number in a vacuum. If you encode more than just that number you can use arithmetic encoding to use some of the "spare room" the rounding creates.

answered 13 hours ago

ratchet freak

2,86388

add a comment |

You have 10 base10 digits. This is $10^10$ possible values.

Encoding this in an alphabet with 64 tokens you need $lceil log_64(10^10) rceil = 6$ characters.

If you encode it in straight binary you only need $lceil log_2(10^10) rceil = 34$ bits total. 4 bytes + 2 bits.

Unless there is a pattern in the number itself this is the best you can do.

Note that this looks at the number in a vacuum. If you encode more than just that number you can use arithmetic encoding to use some of the "spare room" the rounding creates.

answered 13 hours ago

ratchet freak

2,86388

add a comment |

You have 10 base10 digits. This is $10^10$ possible values.

Encoding this in an alphabet with 64 tokens you need $lceil log_64(10^10) rceil = 6$ characters.

If you encode it in straight binary you only need $lceil log_2(10^10) rceil = 34$ bits total. 4 bytes + 2 bits.

Unless there is a pattern in the number itself this is the best you can do.

Note that this looks at the number in a vacuum. If you encode more than just that number you can use arithmetic encoding to use some of the "spare room" the rounding creates.

answered 13 hours ago

ratchet freak

2,86388

You have 10 base10 digits. This is $10^10$ possible values.

Encoding this in an alphabet with 64 tokens you need $lceil log_64(10^10) rceil = 6$ characters.

If you encode it in straight binary you only need $lceil log_2(10^10) rceil = 34$ bits total. 4 bytes + 2 bits.

Unless there is a pattern in the number itself this is the best you can do.

Note that this looks at the number in a vacuum. If you encode more than just that number you can use arithmetic encoding to use some of the "spare room" the rounding creates.

answered 13 hours ago

ratchet freak

2,86388

answered 13 hours ago

ratchet freak

2,86388

answered 13 hours ago

ratchet freak

2,86388

answered 13 hours ago

ratchet freak

2,86388

add a comment |

[2-9][0-8]d - [2-9]dd - dddd

(You can also make some of the codes be shorter without collisions, but as a consequence some of them will have to be longer. Again, pigeonhole principle.)

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

$begingroup$
how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.
$endgroup$
– ratchet freak
9 hours ago

$begingroup$
@ratchetfreak D'oh, of course, you're right. Let me adjust that.
$endgroup$
– Draconis
9 hours ago

add a comment |

[2-9][0-8]d - [2-9]dd - dddd

(You can also make some of the codes be shorter without collisions, but as a consequence some of them will have to be longer. Again, pigeonhole principle.)

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

$begingroup$
how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.
$endgroup$
– ratchet freak
9 hours ago

$begingroup$
@ratchetfreak D'oh, of course, you're right. Let me adjust that.
$endgroup$
– Draconis
9 hours ago

add a comment |

[2-9][0-8]d - [2-9]dd - dddd

(You can also make some of the codes be shorter without collisions, but as a consequence some of them will have to be longer. Again, pigeonhole principle.)

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

[2-9][0-8]d - [2-9]dd - dddd

(You can also make some of the codes be shorter without collisions, but as a consequence some of them will have to be longer. Again, pigeonhole principle.)

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

edited 8 hours ago

answered 11 hours ago

Draconis

5,252820

answered 11 hours ago

Draconis

5,252820

answered 11 hours ago

Draconis

5,252820

$begingroup$
how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.
$endgroup$
– ratchet freak
9 hours ago

$begingroup$
@ratchetfreak D'oh, of course, you're right. Let me adjust that.
$endgroup$
– Draconis
9 hours ago

add a comment |

$begingroup$
how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.
$endgroup$
– ratchet freak
9 hours ago

$begingroup$
@ratchetfreak D'oh, of course, you're right. Let me adjust that.
$endgroup$
– Draconis
9 hours ago

how does "the fifth and sixth characters can't both be 1" remove 8 possible numbers? shouldn't that be 1 out of every 100? So 5,702,400,000 total possible numbers.

– ratchet freak
9 hours ago

@ratchetfreak D'oh, of course, you're right. Let me adjust that.

– Draconis
9 hours ago

add a comment |

anemaria20 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

anemaria20 is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Computer Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Yhyjyk

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

2 Answers
2

2 Answers
2