15. Group By: split-apply-combine

Group By: split-apply-combine

By “group by” we are referring to a process involving one or more of the following steps

  • Splitting the data into groups based on some criteria
  • Applying a function to each group independently
  • Combining the results into a data structure

Of these, the split step is the most straightforward. In fact, in many situations you may wish to split the data set into groups and do something with those groups yourself. In the apply step, we might wish to one of the following:

  • Aggregation: computing a summary statistic (or statistics) about each group. Some examples:

    • Compute group sums or means
    • Compute group sizes / counts
  • Transformation: perform some group-