How to figure out whether the data is sample data or population data apart from the client's information?Locating freely available data samplesWhat is the difference between a population and a sample?Whether to use r-square or adjusted r-square with a small sample size that may represent the entire population?How to Estimate Population Variance from Multiple SamplesPopulation or Sample Standard Deviation: monthly climate dataIs this conclusion drawn from sample or population?Likelihood that two random sample sets come from the same populationHow do I compare means when I have a sample and the whole population?Is it possible to estimate a population mean from a convenience sample?Sample is almost the same as the population
Is there a way to get a compiler for the original B programming language?
Can solid acids and bases have pH values? If not, how are they classified as acids or bases?
Rivers without rain
Realistic Necromancy?
What route did the Hindenburg take when traveling from Germany to the U.S.?
Packing rectangles: Does rotation ever help?
Why does nature favour the Laplacian?
How to make a pipeline wait for end-of-file or stop after an error?
What makes accurate emulation of old systems a difficult task?
Term for maladaptive animal behavior that will lead to their demise?
If a warlock with the Repelling Blast invocation casts Eldritch Blast and hits, must the targets always be pushed back?
How to pronounce 'C++' in Spanish
Pulling the rope with one hand is as heavy as with two hands?
Why was Germany not as successful as other Europeans in establishing overseas colonies?
What language was spoken in East Asia before Proto-Turkic?
Binary Numbers Magic Trick
What is the strongest case that can be made in favour of the UK regaining some control over fishing policy after Brexit?
Please, smoke with good manners
Do I have to worry about players making “bad” choices on level up?
Don’t seats that recline flat defeat the purpose of having seatbelts?
US visa is under administrative processing, I need the passport back ASAP
How exactly does Hawking radiation decrease the mass of black holes?
Stateful vs non-stateful app
Error message with tabularx
How to figure out whether the data is sample data or population data apart from the client's information?
Locating freely available data samplesWhat is the difference between a population and a sample?Whether to use r-square or adjusted r-square with a small sample size that may represent the entire population?How to Estimate Population Variance from Multiple SamplesPopulation or Sample Standard Deviation: monthly climate dataIs this conclusion drawn from sample or population?Likelihood that two random sample sets come from the same populationHow do I compare means when I have a sample and the whole population?Is it possible to estimate a population mean from a convenience sample?Sample is almost the same as the population
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
What are the ways available to figure out whether the data is sample data or population data apart from the client's information?
sample population
$endgroup$
add a comment |
$begingroup$
What are the ways available to figure out whether the data is sample data or population data apart from the client's information?
sample population
$endgroup$
add a comment |
$begingroup$
What are the ways available to figure out whether the data is sample data or population data apart from the client's information?
sample population
$endgroup$
What are the ways available to figure out whether the data is sample data or population data apart from the client's information?
sample population
sample population
edited 28 mins ago
Richard Hardy
28.5k644131
28.5k644131
asked 1 hour ago
AkarshAkarsh
111
111
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
I think there is no way to know just by looking at the data.
In general, the population may be small or big and the sample may be small or big, hence in some situations the sample size might be quite close to the whole population. Imagine we would collect 90%, 95%, 99% and 100% of the population. I don't expect anything completely new happening with the results in case of the 100% (=population) data.
But maybe you know something about the population? If you know that the population consists of all customers of the company and you know how many customers they have per month you can maybe estimate how big the population is?
My question would be why you want to know that and why you don't know that? Usually one should know something about the data one is supposed to analyse. Keep in mind that inferencial statistics tries to draw conclusions about the population based on information that we know from the sample. This means if you have the population data there is no need for inferencial statistics (significance test, confidence intervals,...) and you can simply see the descriptive statistics. So such information about the data should be known by the analyst.
$endgroup$
add a comment |
$begingroup$
A sample is a just subset of the population. If the sample is representative (which it should be), the only main between sample and population is their size.
However, it should be noted that for any analysis in real life it's very important to know where the data comes from, and the process of collecting them needs to be well documented. Not even knowing whether the data is a sample looks like a rather bad red flag.
$endgroup$
add a comment |
$begingroup$
There is no way - the "population" of interest is part of the specification of the problem.
Statistical problems involving inference to a "population" require specification of the group of interest, about which we are making an inference. Only a proper specification of the problem ---in this case, from a briefing from the client--- can give you this. Of course, there may be situations where the client does not know how to specify their problem in a well-posed way, and in this case, part of the responsibility of the statistician is to elicit contextual information to assist the client to formulate a well-posed problem. In some cases, the source of existing sample data may also imply some natural suggestions about the "population" for which we can make a valid inference. (Generally, a random sample allows us to make an inference about characteristics of the corresponding sampling frame, which may be close to some population of common interest.) Sample data cannot formulate your statistical problem for you. The problem must arise from some objective or context.
As to whether data is "sample data" or "population data", that also depends on context, and specification of the group of interest. For example, suppose we consider data on the driving record (demerit points, fines, years with license, etc.) of a random sample of people with driver's licenses registered in a particular State. That data would be "sample data" from the associated sampling frame from which they were drawn ---i.e., all people who hold a driver's license registered in that State--- and the data of all people with a driver's license registered in that State would be the "population". However, that "population" can also be regarded as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the country, which can in turn be considered as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the world.
All of this goes back to a fundamental aspect of sampling problems. In any such problem, there must be a specified "population" of interest, for which we wish to make an inference, and there must be "sample data" that bears somehow on that inference. (Ideally, we would like the sample data to be a random sample from a sampling frame that is close to the population of interest.)
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405456%2fhow-to-figure-out-whether-the-data-is-sample-data-or-population-data-apart-from%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think there is no way to know just by looking at the data.
In general, the population may be small or big and the sample may be small or big, hence in some situations the sample size might be quite close to the whole population. Imagine we would collect 90%, 95%, 99% and 100% of the population. I don't expect anything completely new happening with the results in case of the 100% (=population) data.
But maybe you know something about the population? If you know that the population consists of all customers of the company and you know how many customers they have per month you can maybe estimate how big the population is?
My question would be why you want to know that and why you don't know that? Usually one should know something about the data one is supposed to analyse. Keep in mind that inferencial statistics tries to draw conclusions about the population based on information that we know from the sample. This means if you have the population data there is no need for inferencial statistics (significance test, confidence intervals,...) and you can simply see the descriptive statistics. So such information about the data should be known by the analyst.
$endgroup$
add a comment |
$begingroup$
I think there is no way to know just by looking at the data.
In general, the population may be small or big and the sample may be small or big, hence in some situations the sample size might be quite close to the whole population. Imagine we would collect 90%, 95%, 99% and 100% of the population. I don't expect anything completely new happening with the results in case of the 100% (=population) data.
But maybe you know something about the population? If you know that the population consists of all customers of the company and you know how many customers they have per month you can maybe estimate how big the population is?
My question would be why you want to know that and why you don't know that? Usually one should know something about the data one is supposed to analyse. Keep in mind that inferencial statistics tries to draw conclusions about the population based on information that we know from the sample. This means if you have the population data there is no need for inferencial statistics (significance test, confidence intervals,...) and you can simply see the descriptive statistics. So such information about the data should be known by the analyst.
$endgroup$
add a comment |
$begingroup$
I think there is no way to know just by looking at the data.
In general, the population may be small or big and the sample may be small or big, hence in some situations the sample size might be quite close to the whole population. Imagine we would collect 90%, 95%, 99% and 100% of the population. I don't expect anything completely new happening with the results in case of the 100% (=population) data.
But maybe you know something about the population? If you know that the population consists of all customers of the company and you know how many customers they have per month you can maybe estimate how big the population is?
My question would be why you want to know that and why you don't know that? Usually one should know something about the data one is supposed to analyse. Keep in mind that inferencial statistics tries to draw conclusions about the population based on information that we know from the sample. This means if you have the population data there is no need for inferencial statistics (significance test, confidence intervals,...) and you can simply see the descriptive statistics. So such information about the data should be known by the analyst.
$endgroup$
I think there is no way to know just by looking at the data.
In general, the population may be small or big and the sample may be small or big, hence in some situations the sample size might be quite close to the whole population. Imagine we would collect 90%, 95%, 99% and 100% of the population. I don't expect anything completely new happening with the results in case of the 100% (=population) data.
But maybe you know something about the population? If you know that the population consists of all customers of the company and you know how many customers they have per month you can maybe estimate how big the population is?
My question would be why you want to know that and why you don't know that? Usually one should know something about the data one is supposed to analyse. Keep in mind that inferencial statistics tries to draw conclusions about the population based on information that we know from the sample. This means if you have the population data there is no need for inferencial statistics (significance test, confidence intervals,...) and you can simply see the descriptive statistics. So such information about the data should be known by the analyst.
edited 31 mins ago
answered 37 mins ago
stats.and.rstats.and.r
4339
4339
add a comment |
add a comment |
$begingroup$
A sample is a just subset of the population. If the sample is representative (which it should be), the only main between sample and population is their size.
However, it should be noted that for any analysis in real life it's very important to know where the data comes from, and the process of collecting them needs to be well documented. Not even knowing whether the data is a sample looks like a rather bad red flag.
$endgroup$
add a comment |
$begingroup$
A sample is a just subset of the population. If the sample is representative (which it should be), the only main between sample and population is their size.
However, it should be noted that for any analysis in real life it's very important to know where the data comes from, and the process of collecting them needs to be well documented. Not even knowing whether the data is a sample looks like a rather bad red flag.
$endgroup$
add a comment |
$begingroup$
A sample is a just subset of the population. If the sample is representative (which it should be), the only main between sample and population is their size.
However, it should be noted that for any analysis in real life it's very important to know where the data comes from, and the process of collecting them needs to be well documented. Not even knowing whether the data is a sample looks like a rather bad red flag.
$endgroup$
A sample is a just subset of the population. If the sample is representative (which it should be), the only main between sample and population is their size.
However, it should be noted that for any analysis in real life it's very important to know where the data comes from, and the process of collecting them needs to be well documented. Not even knowing whether the data is a sample looks like a rather bad red flag.
answered 38 mins ago
PerePere
4,7531821
4,7531821
add a comment |
add a comment |
$begingroup$
There is no way - the "population" of interest is part of the specification of the problem.
Statistical problems involving inference to a "population" require specification of the group of interest, about which we are making an inference. Only a proper specification of the problem ---in this case, from a briefing from the client--- can give you this. Of course, there may be situations where the client does not know how to specify their problem in a well-posed way, and in this case, part of the responsibility of the statistician is to elicit contextual information to assist the client to formulate a well-posed problem. In some cases, the source of existing sample data may also imply some natural suggestions about the "population" for which we can make a valid inference. (Generally, a random sample allows us to make an inference about characteristics of the corresponding sampling frame, which may be close to some population of common interest.) Sample data cannot formulate your statistical problem for you. The problem must arise from some objective or context.
As to whether data is "sample data" or "population data", that also depends on context, and specification of the group of interest. For example, suppose we consider data on the driving record (demerit points, fines, years with license, etc.) of a random sample of people with driver's licenses registered in a particular State. That data would be "sample data" from the associated sampling frame from which they were drawn ---i.e., all people who hold a driver's license registered in that State--- and the data of all people with a driver's license registered in that State would be the "population". However, that "population" can also be regarded as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the country, which can in turn be considered as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the world.
All of this goes back to a fundamental aspect of sampling problems. In any such problem, there must be a specified "population" of interest, for which we wish to make an inference, and there must be "sample data" that bears somehow on that inference. (Ideally, we would like the sample data to be a random sample from a sampling frame that is close to the population of interest.)
$endgroup$
add a comment |
$begingroup$
There is no way - the "population" of interest is part of the specification of the problem.
Statistical problems involving inference to a "population" require specification of the group of interest, about which we are making an inference. Only a proper specification of the problem ---in this case, from a briefing from the client--- can give you this. Of course, there may be situations where the client does not know how to specify their problem in a well-posed way, and in this case, part of the responsibility of the statistician is to elicit contextual information to assist the client to formulate a well-posed problem. In some cases, the source of existing sample data may also imply some natural suggestions about the "population" for which we can make a valid inference. (Generally, a random sample allows us to make an inference about characteristics of the corresponding sampling frame, which may be close to some population of common interest.) Sample data cannot formulate your statistical problem for you. The problem must arise from some objective or context.
As to whether data is "sample data" or "population data", that also depends on context, and specification of the group of interest. For example, suppose we consider data on the driving record (demerit points, fines, years with license, etc.) of a random sample of people with driver's licenses registered in a particular State. That data would be "sample data" from the associated sampling frame from which they were drawn ---i.e., all people who hold a driver's license registered in that State--- and the data of all people with a driver's license registered in that State would be the "population". However, that "population" can also be regarded as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the country, which can in turn be considered as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the world.
All of this goes back to a fundamental aspect of sampling problems. In any such problem, there must be a specified "population" of interest, for which we wish to make an inference, and there must be "sample data" that bears somehow on that inference. (Ideally, we would like the sample data to be a random sample from a sampling frame that is close to the population of interest.)
$endgroup$
add a comment |
$begingroup$
There is no way - the "population" of interest is part of the specification of the problem.
Statistical problems involving inference to a "population" require specification of the group of interest, about which we are making an inference. Only a proper specification of the problem ---in this case, from a briefing from the client--- can give you this. Of course, there may be situations where the client does not know how to specify their problem in a well-posed way, and in this case, part of the responsibility of the statistician is to elicit contextual information to assist the client to formulate a well-posed problem. In some cases, the source of existing sample data may also imply some natural suggestions about the "population" for which we can make a valid inference. (Generally, a random sample allows us to make an inference about characteristics of the corresponding sampling frame, which may be close to some population of common interest.) Sample data cannot formulate your statistical problem for you. The problem must arise from some objective or context.
As to whether data is "sample data" or "population data", that also depends on context, and specification of the group of interest. For example, suppose we consider data on the driving record (demerit points, fines, years with license, etc.) of a random sample of people with driver's licenses registered in a particular State. That data would be "sample data" from the associated sampling frame from which they were drawn ---i.e., all people who hold a driver's license registered in that State--- and the data of all people with a driver's license registered in that State would be the "population". However, that "population" can also be regarded as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the country, which can in turn be considered as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the world.
All of this goes back to a fundamental aspect of sampling problems. In any such problem, there must be a specified "population" of interest, for which we wish to make an inference, and there must be "sample data" that bears somehow on that inference. (Ideally, we would like the sample data to be a random sample from a sampling frame that is close to the population of interest.)
$endgroup$
There is no way - the "population" of interest is part of the specification of the problem.
Statistical problems involving inference to a "population" require specification of the group of interest, about which we are making an inference. Only a proper specification of the problem ---in this case, from a briefing from the client--- can give you this. Of course, there may be situations where the client does not know how to specify their problem in a well-posed way, and in this case, part of the responsibility of the statistician is to elicit contextual information to assist the client to formulate a well-posed problem. In some cases, the source of existing sample data may also imply some natural suggestions about the "population" for which we can make a valid inference. (Generally, a random sample allows us to make an inference about characteristics of the corresponding sampling frame, which may be close to some population of common interest.) Sample data cannot formulate your statistical problem for you. The problem must arise from some objective or context.
As to whether data is "sample data" or "population data", that also depends on context, and specification of the group of interest. For example, suppose we consider data on the driving record (demerit points, fines, years with license, etc.) of a random sample of people with driver's licenses registered in a particular State. That data would be "sample data" from the associated sampling frame from which they were drawn ---i.e., all people who hold a driver's license registered in that State--- and the data of all people with a driver's license registered in that State would be the "population". However, that "population" can also be regarded as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the country, which can in turn be considered as (non-randomised) "sample data" from the larger class of all people with driver's licenses registered anywhere in the world.
All of this goes back to a fundamental aspect of sampling problems. In any such problem, there must be a specified "population" of interest, for which we wish to make an inference, and there must be "sample data" that bears somehow on that inference. (Ideally, we would like the sample data to be a random sample from a sampling frame that is close to the population of interest.)
answered 7 mins ago
BenBen
29.2k234130
29.2k234130
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f405456%2fhow-to-figure-out-whether-the-data-is-sample-data-or-population-data-apart-from%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown