Databricks-Übung aktualisieren
-
Runtime: 11.3 LTS (Scala 2.12, Spark 3.3.0) -
bei spark.sql( """ SELECT CallType, count(*) FROM fireServiceCalls GROUP BY CallType ORDER BY count(*) DESC """ ).show()
stattdessen
[…].show(n_call_types, False)
-
Spark-3-Patterns: date_pattern = 'M/d/y' ts_pattern = 'M/d/y h:m:s a'
-
Anweisung Klicken Sie oben (unterhalb des Notebook-Titels, dort wo "Attached" steht) auf den Namen des Clusters
an aktuelles UI anspassen.
-
Berechnung der Anzahl Partitions: %scala import org.apache.spark.util.Utils val fileSizeBytes = 1634673683 // from `%fs ls` above val maxPartitionBytes = Utils.byteStringAsBytes(spark.conf.get("spark.sql.files.maxPartitionBytes")) val numberOfPartitions = fileSizeBytes.toDouble / maxPartitionBytes // Round up, because we can't have just part of a partition: val effectiveNumberOfPartitions = numberOfPartitions.ceil.toInt
Edited by Raphael Das Gupta