問題(經簡化及修改)
Q1:民研如何抽取次樣本?有何必要?
Q2:為何有那麼多個案有缺數?是否代表受訪者沒有被問到相關問題?
Q3:甚麽是「評分人數」?與次樣本數目有何分別?
Q4:有沒有編碼簿(codebook)可提供?
香港民研回覆
A1:在使用抽取次樣本(subsampling)方式處理一條問題時,我們通常會設定抽取機率為0.5或0.6,讓系統隨機抽取該比例的受訪者回答該問題。做法可縮短問卷的實際長度,避免受訪者回應疲勞(respondent fatigue),並減低因受訪者拒絕回應所造成的統計偏差(non-response bias)。
A2:如沒有任何數值,一般代表受訪者沒有被問到相關問題。數值「-99」則代表我們有問該問題,但受訪者沒有回答。一般而言,不提問的原因有兩種。其一,是上述的抽取次樣本方法引起。另一原因,則是受訪者早前的回應,已顯示相關問題並不適用。例如,當我們知道受訪者於香港出生,我們便不會問「你來了香港多少年?」
A3:評分題目的「評分人數」只包括能夠給出一個實際分數的受訪者。而我們的「次樣本數目」,則包括所有被問到該問題的受訪者,當中包含給出非分數答案如「唔知/難講」的受訪者,以及拒絕回答的人。
A4:我們以sav及csvy格式提供的數據集已包含每個變量的描述和問題字眼,以及所有變量中可能出現的數值所代表的意思。
Questions (abstracted and sub-edited)
Q1: How are subsamples determined and why is it necessary?
Q2: Why are there so many cases with missing data (“NA”s) within variables? Does it mean that the respondents were not asked the question?
Q3: What is the “number of raters”, and how does this differ from the subsample?
Q4: Is there a codebook with more details available?
Reply from HKPORI
A1: For each question we used the subsampling technique, usually setting the probability (p) at p=0.5 or 0.6 to randomly select each question for each respondent to answer. We took such measures to shorten the effective length of the questionnaire for each respondent to minimise respondent fatigue and to reduce non-response bias.
A2: Lack of any data generally means the respondent was not asked the question. A value of “-99” means the question was asked but the respondent did not answer it. Generally, there are two reasons why questions are not asked. First, the use of subsampling technique as explained above. Second, the questions are not applicable to the respondent, based on his/her answers in some previous questions. For example, if a respondent was born in Hong Kong, we would not ask “How long have you been living in Hong Kong?”
A3: “Number of raters” only counts respondents who gave a “numerical” answer when we asked for a rating. In our definition, “subsample” counts all respondents who were asked the question, which includes also people who gave “non-numerical” answers such as “don’t know / hard to say”, as well as those who refused to answer.
A4: Our datasets provided in sav and csvy formats have already included descriptions and question wording for each variable, as well as the label used for all possible values of all variables.