Databricks-Übung aktualisieren

Runtime: 11.3 LTS (Scala 2.12, Spark 3.3.0)

bei

spark.sql(
  """
  SELECT CallType, count(*)
  FROM fireServiceCalls
  GROUP BY CallType
  ORDER BY count(*) DESC
  """
).show()

stattdessen […].show(n_call_types, False)

Spark-3-Patterns:

date_pattern = 'M/d/y'
ts_pattern = 'M/d/y h:m:s a'

Anweisung

Klicken Sie oben (unterhalb des Notebook-Titels, dort wo "Attached" steht) auf den Namen des Clusters

an aktuelles UI anspassen.

Berechnung der Anzahl Partitions:

%scala
import org.apache.spark.util.Utils

val fileSizeBytes = 1634673683  // from `%fs ls` above
val maxPartitionBytes = Utils.byteStringAsBytes(spark.conf.get("spark.sql.files.maxPartitionBytes"))

val numberOfPartitions = fileSizeBytes.toDouble / maxPartitionBytes

// Round up, because we can't have just part of a partition:
val effectiveNumberOfPartitions = numberOfPartitions.ceil.toInt

Edited Mar 07, 2023 by Raphael Das Gupta