Data Analytics issueshttps://gitlab.ost.ch/db/datana/-/issues2023-03-07T17:20:54+01:00https://gitlab.ost.ch/db/datana/-/issues/4Databricks-Übung aktualisieren2023-03-07T17:20:54+01:00Raphael Das GuptaDatabricks-Übung aktualisieren- [x] Runtime: 11.3 LTS (Scala 2.12, Spark 3.3.0)
- [x] bei
```
spark.sql(
"""
SELECT CallType, count(*)
FROM fireServiceCalls
GROUP BY CallType
ORDER BY count(*) DESC
"""
).show()
```
stattde...- [x] Runtime: 11.3 LTS (Scala 2.12, Spark 3.3.0)
- [x] bei
```
spark.sql(
"""
SELECT CallType, count(*)
FROM fireServiceCalls
GROUP BY CallType
ORDER BY count(*) DESC
"""
).show()
```
stattdessen `[…].show(n_call_types, False)`
- [x] Spark-3-[Patterns](https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html):
```
date_pattern = 'M/d/y'
ts_pattern = 'M/d/y h:m:s a'
```
- [x] Anweisung
> Klicken Sie oben (unterhalb des Notebook-Titels, dort wo "Attached" steht) auf den Namen des Clusters
an aktuelles UI anspassen.
- [x] Berechnung der Anzahl Partitions:
```
%scala
import org.apache.spark.util.Utils
val fileSizeBytes = 1634673683 // from `%fs ls` above
val maxPartitionBytes = Utils.byteStringAsBytes(spark.conf.get("spark.sql.files.maxPartitionBytes"))
val numberOfPartitions = fileSizeBytes.toDouble / maxPartitionBytes
// Round up, because we can't have just part of a partition:
val effectiveNumberOfPartitions = numberOfPartitions.ceil.toInt
```Raphael Das GuptaRaphael Das Gupta2023-03-08