A1:在使用抽取次樣本(subsampling)方式處理一條問題時,我們通常會設定抽取機率為0.5或0.6,讓系統隨機抽取該比例的受訪者回答該問題。做法可縮短問卷的實際長度,避免受訪者回應疲勞(respondent fatigue),並減低因受訪者拒絕回應所造成的統計偏差(non-response bias)。
Questions (abstracted and sub-edited)
Q1: How are subsamples determined and why is it necessary?
Q2: Why are there so many cases with missing data (“NA”s) within variables? Does it mean that the respondents were not asked the question?
Q3: What is the “number of raters”, and how does this differ from the subsample?
Q4: Is there a codebook with more details available?
Reply from HKPORI
A1: For each question we used the subsampling technique, usually setting the probability (p) at p=0.5 or 0.6 to randomly select each question for each respondent to answer. We took such measures to shorten the effective length of the questionnaire for each respondent to minimise respondent fatigue and to reduce non-response bias.
A2: Lack of any data generally means the respondent was not asked the question. A value of “-99” means the question was asked but the respondent did not answer it. Generally, there are two reasons why questions are not asked. First, the use of subsampling technique as explained above. Second, the questions are not applicable to the respondent, based on his/her answers in some previous questions. For example, if a respondent was born in Hong Kong, we would not ask “How long have you been living in Hong Kong?”
A3: “Number of raters” only counts respondents who gave a “numerical” answer when we asked for a rating. In our definition, “subsample” counts all respondents who were asked the question, which includes also people who gave “non-numerical” answers such as “don’t know / hard to say”, as well as those who refused to answer.
A4: Our datasets provided in sav and csvy formats have already included descriptions and question wording for each variable, as well as the label used for all possible values of all variables.