
上QQ阅读APP看书,第一时间看更新
Defining and using custom data sources in Spark
You can define your own data sources and combine the data from such sources with data from other more standard data sources (for example, relational databases, Parquet files, and so on). In Chapter 5, Using Spark SQL in Streaming Applications, we define a custom data source for streaming data from public APIs available from Transport for London (TfL) site.
Refer to the video Spark DataFrames Simple and Fast Analysis of Structured Data - Michael Armbrust (Databricks) at https://www.youtube.com/watch?v=xWkJCUcD55w for a good example of defining a data source for Jira and creating a Spark SQL DataFrame from it.