上QQ阅读APP看书，第一时间看更新

Using Spark SQL for creating pivot tables

Pivot tables create alternate views of your data and are commonly used during data exploration. In the following example, we demonstrate pivoting using Spark DataFrames:

The following example pivots on housing loan taken and computes the numbers by marital status:

In the next example, we create a DataFrame with appropriate column names for the total and average number of calls:

In the following example, we create a DataFrame with appropriate column names for the total and average duration of calls for each job category:

In the following example, we show pivoting to compute average call duration for each job category, while also specifying a subset of marital status:

The following example is the same as the preceding one, except that we split the average call duration values by the housing loan field as well in this case:

Next, we show how you can create a DataFrame of pivot table of term deposits subscribed by month, save it to disk, and read it back into a RDD:

Further, we use the RDD in the preceding step to compute quarterly totals of customers who subscribed and did not subscribe to term loans:

We will introduce a detailed analysis of other types of data, including streaming data, large-scale graphs, time-series data, and so on, later in this book.