如何在Azure Databricks Spark中从DataFrame获取特定的行和列

Question

I have a DataFrame in Azure Databricks which looks like 我在Azure Databricks中有一个DataFrame看起来像

Col a| Col b
------------
Marc | Taylor
John | McC
Bill | Gates

I would like to extract a specfic column and row. 我想提取特定的列和行。 I know how to extract a specific column and assign it to a variable 我知道如何提取特定的列并将其分配给变量

result = ds.select(Col a)

But how to get row number 2 for example in this line of code? 但是如何在此行代码中获取第2行呢？

Answer 1

You can use monotonically_increasing_id() function to generate a new column with serial number and then use filter function to filter in the row 您可以使用monotonically_increasing_id()函数生成具有序列号的新列，然后使用filter函数在行中进行过滤

from pyspark.sql.functions import *
ds.withColumn('sn', monotonically_increasing_id())\
    .filter(col('sn') == 1)\
    .drop('sn')\
    .show(truncate=False)

which would give you 这会给你

+-----+-----+
|Col a|Col b|
+-----+-----+
|John |McC  |
+-----+-----+

Note: monotonically_increasing_id will generate increasing order numbers but not guaranteed to generate serial numbers starting from 0. 注意：monotonically_increasing_id将生成递增的订单号，但不能保证生成从0开始的序列号。

To sum up, filter() and select()/where() functions are two ways to select rows from a dataframe 总而言之， filter()和select()/where()函数是从数据框中选择行的两种方法

I hope the answer is helpful 我希望答案是有帮助的

Answer 2

I can get the value with python using this: 我可以使用python获取值：

df_sample = yourDataFrame.select(collect_list("Col b").alias("a")) value = df_sample.select(col("a").getItem(1).alias("x")) display(value)

hope it helps. 希望能帮助到你。

如何在Azure Databricks Spark中从DataFrame获取特定的行和列

问题描述

2 个解决方案

解决方案1
0 2018-08-26 08:32:56

解决方案2
0 2019-01-31 16:57:47

如何在Azure Databricks Spark中从DataFrame获取特定的行和列

问题描述

2 个解决方案

解决方案1 0 2018-08-26 08:32:56

解决方案2 0 2019-01-31 16:57:47

解决方案1
0 2018-08-26 08:32:56

解决方案2
0 2019-01-31 16:57:47