We have tested the stability of interrater reliability of psychiatric symptoms over a quarter of century using 2 rating scales. Interrater reliabilities of items of 2 psychiatric rating scales employed by 2 consecutive follow-ups were compared. Interrater reliabilites proved to be by and large stable. Interrater reliability depends on the standard deviation of the items scores. In addition to the traditional approach, a new statistical method for unifying the assessments from multiple raters is also presented. Using this method, we demonstrated that probabilities of correct ratings are higher in the absence of manifest symptoms, or in the presence of symptoms, as compared with cases characterized by middle scores. To interpret the relationships revealed in the setting of the experiment, we introduce for its theoretical designation the term “validity of reliability.” It is recommended for evaluation of results of rating scales in the context of psychiatric nosology.