[英]How to get a specific row and column from a DataFrame in Azure Databricks Spark
I have a DataFrame in Azure Databricks which looks like 我在Azure Databricks中有一个DataFrame看起来像
Col a| Col b
------------
Marc | Taylor
John | McC
Bill | Gates
I would like to extract a specfic column and row. 我想提取特定的列和行。 I know how to extract a specific column and assign it to a variable
我知道如何提取特定的列并将其分配给变量
result = ds.select(Col a)
But how to get row number 2 for example in this line of code? 但是如何在此行代码中获取第2行呢?
You can use monotonically_increasing_id()
function to generate a new column with serial number and then use filter
function to filter in the row 您可以使用
monotonically_increasing_id()
函数生成具有序列号的新列,然后使用filter
函数在行中进行过滤
from pyspark.sql.functions import *
ds.withColumn('sn', monotonically_increasing_id())\
.filter(col('sn') == 1)\
.drop('sn')\
.show(truncate=False)
which would give you 这会给你
+-----+-----+
|Col a|Col b|
+-----+-----+
|John |McC |
+-----+-----+
Note: monotonically_increasing_id will generate increasing order numbers but not guaranteed to generate serial numbers starting from 0. 注意:monotonically_increasing_id将生成递增的订单号,但不能保证生成从0开始的序列号。
To sum up, filter()
and select()/where()
functions are two ways to select rows from a dataframe 总而言之,
filter()
和select()/where()
函数是从数据框中选择行的两种方法
I hope the answer is helpful 我希望答案是有帮助的
I can get the value with python using this: 我可以使用python获取值:
df_sample = yourDataFrame.select(collect_list("Col b").alias("a")) value = df_sample.select(col("a").getItem(1).alias("x")) display(value)
hope it helps. 希望能帮助到你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.