Ratings

Ratings

Ratings are a central component of the television industry, almost a household word. They are important in television because they indicate the size of an audience for specific programs. Networks and stations then set their advertising rates based on the number of viewers of their programs. Network revenue is thus directly related to the ratings. The word “ratings,” however, is actually rather confusing because it has both a specific and a general meaning. Specifically, a rating is the percentage of all the people (or households) in a particular location tuned to a particular program. In a general sense, the term is used to describe a process (also referred to as “audience measurement”) that endeavors to determine the number and types of viewers watching TV.

Bio

One common rating (in the specific sense) is the rating of a national television show. This calculation measures the number of households—out of all the households in the United States that have TV sets— watching a particular show. There are approximately 100 million households in the United States, and most of them have TV sets. If 20 million of those house- holds are watching NBC at 8:00 P.M., then NBC’s rating for that time period is 20 (20 million / 100 million  20). Another way to describe the process is to say that one rating point is worth 1 million households.

Ratings are also taken for areas smaller than the entire nation. For example, if a particular city (Yourtown) has 100,000 households and 15,000 of them are watching the local news on station KAAA, that station would have a rating of 15. If Yourtown has a population of 300,000 and 30,000 people are watching KAAA, the station’s rating would be 10. And because television viewing is becoming less and less of a group activity with the entire family gathered around the living-room TV set, most ratings are expressed in terms of people rather than households.

Many calculations are related to the rating. Sometimes people, even professionals in the television business, confuse them. One of these calculations is the share. This figure reports the percentage of households (or people) watching a show out of all the households (or people) who have the TV set on. So if Yourtown has 100,000 households but only 50,000 of them have the TV set on and 15,000 of those are watching KAAA, the share is 30 (15,000 / 50,000  30). Shares are always higher than ratings unless, of course, everyone in the country is watching television.

Another calculation is the cume, which reflects the number of different persons who tune in a particular station or network over a period of time. This number is used to show advertisers how many different people hear their message if it is aired at different times such as 7:00 P.M., 8:00 P.M., and 9:00 P.M. If the total number of people available is 100, five of them view at 7:00, those five still view at 8:00, but three new people watch, and then two people turn the TV off, but four new ones join the audience at 9:00, the cume would be 12 (5  3  4  12). Cumes are particularly important to cable networks because their ratings are very low. Two networks with ratings of 1.2 and 1.3 cannot really be differentiated, but if the measurement is taken over a wider time span, a greater difference will probably surface.

Average quarter hours (AQH) are another measurement. This calculation is based on the average number of people viewing a particular station (network, program) for at least 5 minutes during a 15-minute period. For example, if, out of 100 people, 10 view for at least 5 minutes between 7:00 and 7:15, 7 view between 7:15 and 7:30, 11 view between 7:30 and 7:45, and 4 view between 7:45 and 8:00, the AQH rating would be 8 (10  7  11  4  32; 32 / 4  8).

Many other calculations are possible. For example, if the proper data have been collected, it is easy to calculate the percentage of women between the ages of 18 and 34, or of men in urban areas, who watch particular programs. Networks and stations gather as much information as is economically possible. They then try to use the numbers that present their programming strategies in the best light.

The general ratings (audience measurement) process has varied greatly over the years. Audience measurement started in the early 1930s with radio. A group of advertising interests joined together as a nonprofit entity to support ratings known as “Crossleys,” named after Archibald Crossley, the man who conducted them. Crossley used random numbers from telephone directories and called people in about 30 cities to ask them what radio programs they had listened to the day before his call. This method became known as the recall method, because people were remembering what they had listened to the previous day. Crossleys existed for about 15 years but ended in 1946 because several for-profit commercial companies began offering similar services that were considered better.

One of these, the Hooper ratings, was begun by C.E. Hooper. Hooper’s methodology was similar to Crossley’s, except that respondents were asked what programs they were listening to at the time of the call—a method known as the coincidental telephone technique. Another service, the Pulse, used face-to-face interviewing. Interviewees selected by random sampling were asked to name the radio stations they had listened to over the past 24 hours, the past week, and the past five midweek days. If they could not remember, they were shown a roster containing station call letters to aid their memory. This was referred to as the roster-recall method.

Today the main radio audience measurement company is Arbitron. The Arbitron method requires people to keep diaries in which they write down the stations they listen to at various times of the day. In these diaries, they also indicate demographic features—their age, sex, marital status, etc.—so that ratings can be broken down by subaudiences.

The main television audience measurement company is the A.C. Nielsen Company. For many years Nielsen used a combination of diaries and a meter device called the Audimeter. The Audimeter recorded the times when a set was on and the channel to which it was tuned. The diaries were used to collect demographic data and list which family members were watching each program. Nielsen research in some markets still uses diaries, but for most of its data collection, Nielsen now attaches Peoplemeters to TV sets in selected homes. Peoplemeters collect both demographic and channel information because they are equipped with remote control devices. These devices accommodate a number of buttons, one for each person in the household and one for guests. Each person watching TV presses his or her button, which has been programmed with demographic data, to indicate viewing choices and activities.

There are also companies that gather and supply specialized ratings. For example, one company specializes in data concerning news programs and another tracks Latino viewing.

All audience measurement is based on samples. At present, there is no economical way of finding out what every person in the entire country is watching. Diaries, meters, and phone calls are all expensive, so sometimes samples are small. In some cases, no more than .004 percent of the population is being surveyed. However, the rating companies try to make their samples as representative of the larger population as possible. They consider a wide variety of demographic features—size of family, gender and age of head of household, access to cable TV, income, education— and try to construct a sample comprising the same percentage of the various demographic traits as in the general population.

In order to select a representative sample, the companies attempt to locate every housing unit in the country (or city or viewing area), mainly by using readily available government census data. Once all the housing units are accounted for, a computer program is used to randomly select the sample group in such a way that each location has an equal chance of being selected. Company representatives then write or phone people in the households that have been selected, trying to secure their cooperation. About 50 percent of those selected agree to participate. People are slightly more likely to allow meters in their house and to answer questions over the phone than they are to keep diaries. Very little face-to-face interviewing is now conducted because people are reluctant to allow strangers into their houses. When people refuse to cooperate, the computer program selects more households until the number needed for the sample have agreed to volunteer.

Once sample members have agreed to participate, they are often contacted in person. In the case of a diary, someone may show them how to fill it out. In other cases, the diary and instructions may simply be sent in the mail. For a meter, a field representative goes to the home (apartment, dorm room, vacation home, etc.) and attaches the meter to the television set. This person must take into account the entire video configuration of the home—multiple TV sets, VCRs, satellite dishes, cable TV, and anything else that might be attached to the receiver set. The field representative also trains family members in the use of the meter.

People participating in audience measurement are usually paid, but only a small amount, such as five dollars. Ratings companies have found that paying people something makes them feel obligated, but paying them a large amount does not make them more reliable.

Ratings companies try to see that no one remains in the sample very long. Participants become weary of filling out diaries or pushing buttons and cease to take the activities seriously. Soliciting and changing sample members is expensive, however, so companies do keep an eye on the budget when determining how to update the sample.

Once the sample is in order, the data must be collected from the participants. For phone or face-to-face interviews, the interviewer fills in a questionnaire and the data are later entered into a computer. For meters, the data collected are sent over phone lines to a central computer. People keeping diaries mail them back to the company, and employees then enter the data into a computer. Usually, only about 50 percent of diaries are useable; the rest are never mailed back or are so incorrectly filled out that they cannot be used.

From the data collected and calculated by the computer, ratings companies publish reports. These vary according to what was surveyed. Nielsen covers commercial networks, cable networks, syndicated programming, public broadcasting, and local stations. Other companies cover more limited aspects of television. Reports on each night’s prime-time national commercial network programming, based on Nielsen Peoplemeters, are usually ready about 12 hours after the data are collected. It takes considerably longer to generate a report based on diaries. The reports dealing with stations are published less frequently than those for prime-time network TV. Generally, station ratings are undertaken four times a year—November, February, May, and July—periods that are often referred to as “sweeps.” The weeks of the sweeps are very important to local stations because the numbers produced then determine advertising rates for the following three months. Most reports give not only the total ratings and shares but also information broken down into various demographic categories—age, sex, education, income. The various reports are purchased by networks, stations, advertisers, and any other companies with a need to know audience statistics. The cost is lower for small entities, such as TV stations, than for larger entities, such as commercial networks. The latter usually pay several million dollars a year to receive a ratings service.

While current ratings methods may be the best yet devised for calculating audience size and characteristics, audience measurement is far from perfect. Many of the flaws of ratings should be recognized, particularly by those employed in the industry who make significant decisions based on ratings.

Sample size is one aspect of ratings that is frequently questioned in relation to rating accuracy. Statisticians know that the smaller the sample size the more chance there is for error. Ratings companies admit to this and do not claim that their figures are totally accurate. Most of them are only accurate to within 2 or 3 percent. This was of little concern during the times when ratings primarily centered around three networks, each of which was likely to have a rating of 20 or better. Even if CBS’s 20 rating at 8:00 P.M. on Monday was really only 18, this was not likely to disturb the network balance. In all likelihood, CBS’s 20 rating at 8:00 Tuesday evening was really a 22, so numbers evened out. Now that there are many sources of programming, however, and ratings for each are much lower, statistical inaccuracies are more significant. A cable network with a 2 rating might actually be a 4, an increase that might double its income.

Audience measurement companies are willing to increase sample size, but doing so would greatly increase their costs, and customers for ratings do not seem willing to pay. In fact, Arbitron, which had previously undertaken TV ratings, dropped them in 1994 because they were unprofitable.

As access to interactive communication increases, it may be easier to obtain larger samples. Wires from consumer homes back to cable systems could be used to send information about what each cable TV household is viewing. Many of these wires are already in place. Consumers wishing to order pay-per-view programming, for example, can push a button on the remote control that tells the cable system to unscramble the channel for that particular household. Using this technology to determine what is showing on the TV set at all times, however, smacks of a “Big Brother” type of surveillance. Similarly, by the 1970s, a technology existed that enabled trucks to drive along streets and record what was showing on each TV set in the neighborhood. This practice, perceived as an invasion of privacy, was quickly ended.

Sample composition, as well as sample size, is also seen as a weakness in ratings procedures. When telephone numbers are used to draw a sample, households without telephones are excluded and households with more than one phone have a better chance of being included. For many of the rating samples, people who do not speak either English or Spanish are eliminated. Perhaps one of the greatest difficulties for ratings companies is caused by those who eliminate themselves from the sample by refusing to cooperate. Although rating services make every attempt to replace these people with others who are similar in demographic characteristics, the sample’s integrity is somewhat downgraded. Even if everyone originally selected agreed to serve, the sample cannot be totally representative of a larger population. No two people are alike, and even households with the same income and education level and the same number of children of the same ages do not watch exactly the same television shows. Moreover, people within the sample, aware that their viewing or listening habits are being monitored, may act differently than they ordinarily do.

Other problems rise from the fact that each rating technique has specific drawbacks. Households with Peoplemeters may suffer from “button pushing fatigue,” thereby artificially lowering ratings. Additionally, some groups of people are simply more likely to push buttons than others. When the Peoplemeter was first introduced, ratings for sports viewing soared while those for children’s program viewing decreased significantly. One explanation held that men, who were watching sports intently, were very reliable about the button pushing, perhaps, in some cases, out of fear that the TV would shut off if they didn’t push that button. Children, on the other hand, were confused or apathetic about the button, thereby underreporting the viewing of children’s programming. Another theory held that the women of the household had previously kept the diaries and although they were not always aware of what their husbands were actually viewing, they were quite conscious of what their children were watching. Under the diary system, in this explanation, sports programming was underrated.

But diaries have their own problems. The return rate is low, intensifying the problem of the number of uncooperative people in the sample. Even the diaries that are returned often have missing data. Many people do not fill out the diaries as they watch TV. They wait until the last minute and try to remember details—perhaps aided by a copy of TV Guide. Some people are simply not honest about what they watch. Perhaps they do not want to admit to watching a particular type of program.

With interviews, people can be influenced by the tone or attitude of the interviewer or, again, they can be less than truthful about what they watched out of embarrassment or in an attempt to project themselves in a favorable light. People are also hesitant to give information over the phone because they fear the person calling is really a salesperson.

Beyond sampling and methodological problems, ratings can be subject to technical problems: computers that go down, meters that function improperly, cable TV systems that shift the channel numbers of their program services without notice, station antennas struck by lightning.

Additionally, rating methodologies are often complicated and challenged by technological and sociological changes. Videocassette recorders, for example, have presented difficulties for the ratings companies. Generally, programs are counted as being watched if they are recorded. However, many programs that are recorded are never watched, and some are watched several times. In addition, people replaying tape often skip through commercials, destroying the whole purpose of ratings. And ratings companies have yet to decide what to do with sets that show four pictures at once.

Another major deterrent to the accuracy of ratings is the fact that electronic media programmers often try to manipulate the ratings system. Local television stations program their most sensational material during ratings periods. Networks preempt regular series and present star-loaded specials so that their affiliates will fare well in ratings and can therefore adjust their advertising rates upward. Cable networks show new programs as opposed to reruns. All of this, of course, negates the real purpose of determining which electronic media entities have the largest regular audience. It simply indicates which can design the best programming strategy for sweeps week.

Because of the possibility for all these sampling, methodological, technological, and sociological errors, ratings have been subjected to numerous tests and investigations. In fact, in 1963, the House of Representatives became so skeptical of ratings methodologies that it held hearings to investigate the procedures. Most of the skepticism had arisen because of a cease-and-desist order from the Federal Trade Commission (FTC) requiring several audience measurement companies to stop misrepresenting the accuracy and reliability of their reports. The FTC charged the rating companies with relying on hearsay information, making false claims about the nature of their sample populations, improperly combining and reporting data, failing to account for nonresponding sample members, and making arbitrary changes in the rating figures.

The main result of the hearings was that broadcasters themselves established the Electronic Media Rating Council (EMRC) to accredit rating companies. This group periodically checks rating companies to make sure their sample design and implementation meet preset standards that electronic media practitioners have agreed upon, to determine whether interviewers are properly trained, to oversee the procedures for handling diaries, and in other ways assure the ratings companies are compiling their reports as accurately as possible. All the major rating companies have EMRC accreditation.

The EMRC and other research institutions have continued various studies to determine the accuracy of ratings. These studies have shown that people who co- operate with rating services watch more TV, have larger families, and are younger and better educated than those who will not cooperate; telephone inter- viewing gets a 13 percent higher cooperation rate than diaries; Hispanics included in the ratings samples watch less TV and have smaller families than Hispan- ics in general.

Both electronic media practitioners and audience measurement companies want their ratings to be accurate, so both groups undertake testing to the extent they can afford it. In 1989, for example, broadcasters initiated a study to conduct a thorough review of the Peoplemeter. The result was a list of recommendations to Nielsen that included changing the amount of time people participate from two years to one year to eliminate button-pushing fatigue, metering all sets including those on boats and in vacation homes, and simplifying the procedures by which visitors log into the meter.

Still, the weakest link in the system, at present, seems to be how the ratings are used. Networks tout rating superiorities that show .1 percent differences, differences that certainly are not statistically significant. Programs are canceled because their ratings fall one point. Sweeps weeks tend to become more and more sensationalized. At stake, of course, are advertising fees that can translate into millions of dollars. Advertisers and their agencies need to remain vigilant so that they are not paying rates based on artificially stimulated ratings that bear no resemblance to the programs in which the sponsor is actually investing.

At this time all parties in the system seem invested in some form of audience measurement. So long as the failures and inadequacies of these systems are accepted by these major participants, the numbers will remain a valid type of “currency” in the system of television.

Previous
Previous

Rather, Dan

Next
Next

Ready Steady Go!