Operant conditioning

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Operant conditioning is a form of learning. In it, an individual changes its behaviour because of the consequences (results) of the behaviour.

The person or animal learns its behaviour has a consequence. That consequence may be

  1. Reinforcement: a positive or rewarding event. This causes the behaviour to occur more often
  2. Punishment: a negative or punishing event. This causes the behaviour to occur less often
  3. Extinction: no event follows, so the behaviour has no consequences. When a behaviour has no consequences, it will occur less frequently.

There are four different contexts in operant conditioning. Here, the terms 'positive' and 'negative' are not used in their basic sense; positive means that something is added, and negative means something is taken away:

  1. Positive reinforcement' (often just "reinforcement") occurs when there is a reward for a form of behaviour. This will increase the frequency at which the behaviour occurs. In the Skinner box experiment, the reward is in the form of food when the rat presses a lever.
  2. Negative reinforcement (sometimes "escape") occurs when an aversive stimulus is removed. This will increase the frequency at which the behaviour occurs. In the Skinner box experiment, there was a loud noise, which was removed when the rat pressed the lever.
  3. Positive punishment occurs when a stimulus is added, which results in the behaviour occurring less often. Example stimuli may be loud noise, electric shock (rat), or a spanking (child).
  4. Negative punishment occurs when a stimulus is taken away, which results in the behaviour occurring less often. An example might be a child's toy taken away after after the child does an undesired behaviour.

The idea of operant conditioning was first discovered by Edward Thorndike,[1] and analyzed by B.F. Skinner.

Operant conditioning is different from Pavlov's classical conditioning. Operant conditioning deals with the voluntary modification of behaviour; classical conditioning with training a reflex.

Thorndike's law of effect[change | edit source]

Operant conditioning, sometimes called instrumental learning, was first extensively studied by Edward L. Thorndike (1874–1949).[2] He observed the behavior of cats trying to escape from home-made puzzle boxes.[1] When first put in the boxes, cats took a long time to escape. With experience, successful responses occurred more frequently, enabling the cats to escape in less time. In his law of effect, Thorndike theorized that behaviours followed by satisfying consequences tend to be repeated, and those that produce unpleasant consequences are less likely to be repeated. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves by this procedure.

B.F. Skinner (1904–1990) worked out a more detailed analysis of operant conditioning. Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable. He used a record of lever presses or key pecks.[3]

Principles of operant conditioning: [4]

  1. Discrimination, generalization and the importance of context.
    1. Learning takes place in contexts.
    2. Most behaviour is under stimulus control: a particular response only occurs when an appropriate stimulus is present.
    3. Stimulus control is effective even if the stimulus has no meaning to the respondent.
  2. Extinction: operant behaviour undergoes extinction when the reinforcement stops.
    1. The reinforcements only occur when the proper response has been made, and may not occur even then. Behaviours do not weaken and extinguish because of this.
    2. Results depend partly on how often reinforcement is received.
  3. Schedules of reinforcement: the pattern with which reinforcements appeared is crucial.
    1. Fixed interval schedule: reinforcers are presented at fixed time periods, provided that the appropriate response is made.
    2. Variable interval schedule: a behaviour is reinforced based on an average time that has expired since the last reinforcement. Ratio schedules: based on the ratio of responses to reinforcements.
    3. Fixed interval schedule: reinforcement is delivered after a specific number of responses have been made. The special case of presenting reinforcement after each response is called continuous reinforcement.
    4. Variable interval schedule: the delivery reinforcement is based on a particular average number of responses.

References[change | edit source]

  1. 1.0 1.1 Thorndike E.L. 1901. Animal intelligence: an experimental study of the associative processes in animals. Psychological Review Monograph Supplement, 2, 1–109.
  2. It had first been mentioned by Lloyd Morgan.
  3. Mecca Chiesa 2004. Radical behaviorism: the philosophy and the science.
  4. Schacter, Daniel L; Daniel T. Gilbert, and Daniel M. Wegner. 2011. B.F Skinner: The role of reinforcement and Punishment, in Psychology. 2nd ed, New York: Worth.