| sweights {heR.Misc} | R Documentation |
Calculate per-record stratification or post-stratification probabilities for an arbitrary data set with pre-existing per-record probabilities, and either equal or user-specified group probabilities.
sweights(f, w, tab, max.levels=25)
f |
a list or data frame containing equal length factors defining subgroups and corresponding to the elements of an arbitrary vector or the rows of an arbitrary data frame. |
w |
pre-existing weights for each record in f; records are assigned equal weights by default |
tab |
an optional dataframe containing weights for each unique combination of factors (subgroups), consisting of leading columns with names exactly corresponding to the factors in f, plus an additional column containing the actual weights. By default, equal weights are assigned to each subgroup. This dataframe can be easily formed from a frequency array generated by table or xtab, e.g., by using the as.data.frame.table function. |
max.levels |
the maximum number of levels that are allowed for each factor, defaulting to 25; too many factors levels may signal that something is awry with the factor specification, e.g., not actually specifying the correct factor or specifying an improperly cut numeric vector |
We calculate a weight for every record in the set of equal-length
factors in f.
The factors can correspond to any arbitrary data
object of the same length as the factors.
These per-record stratification weights for the data
set are calculated so that the probability of selecting a record in
a given subgroup (defined by factors in f, each the same
length as the unspecified data set) is equal to
desired probability as specified by
the flattened frequency table tab, defaulting to equal
probabilities for each
subgroup, if tab is not given.
The pre-existing probabilities for records, given by w,
are preserved within the defined subgroups.
Pre-existing probabilities are
equalized across records, if w is not specified.
See more details on the calculation of weights below.
Note that the names of the dimensions in tab must match the names of the
factors in f.
Factor combinations in f, which don't have matches in tab,
are assigned weights of zero.
A vector of stratification (or post-stratification) weights equal in length to the factors in f.
The pre-existing probabilities for each group, i, are represented by Pi and calculated from the pre-exisiting per-record weights w as follows:
Pi = sum(wij) from j = 1 to ni
where ni is the number of records in each group and j is the record index within a given group.
To obtain the desired per-group weights, represented as pi, the pre-existing weights for records belonging to group i are multiplied by the ratio pi/Pi. After multiplication, the sum of the new weights within each group will be pi, and the original relative weights for members of a group are unchanged.
For a simple example, let's assume that the original, pre-existing weights are equal for all records and that the new group probabilities are to be equal across all groups. The total number of groups is N and the total number of records is M, so that per-record probabilities are 1/M and the group probabilites, pi, are 1/N. The pre-exisiting group probabilities, Pi, are equal to the number of records in a group divided by the total number of records, ni/M. The factor to multiply by each member of a particular group i is then:
pi/Pi = M/(N*ni)
Multiplying these factors by the per-record probability, 1/M, which is the same for all records in this example, the new per-record probabilities for each group are 1/(N*ni), which is simply the reciprocal of the product of the number of groups and the number of group records.
Neil Klepeis