简体   繁体   English

如何在Azure Databricks Spark中从DataFrame获取特定的行和列

[英]How to get a specific row and column from a DataFrame in Azure Databricks Spark

I have a DataFrame in Azure Databricks which looks like 我在Azure Databricks中有一个DataFrame看起来像

Col a| Col b
------------
Marc | Taylor
John | McC
Bill | Gates

I would like to extract a specfic column and row. 我想提取特定的列和行。 I know how to extract a specific column and assign it to a variable 我知道如何提取特定的列并将其分配给变量

result = ds.select(Col a)

But how to get row number 2 for example in this line of code? 但是如何在此行代码中获取第2行呢?

You can use monotonically_increasing_id() function to generate a new column with serial number and then use filter function to filter in the row 您可以使用monotonically_increasing_id()函数生成具有序列号的新列,然后使用filter函数在行中进行过滤

from pyspark.sql.functions import *
ds.withColumn('sn', monotonically_increasing_id())\
    .filter(col('sn') == 1)\
    .drop('sn')\
    .show(truncate=False)

which would give you 这会给你

+-----+-----+
|Col a|Col b|
+-----+-----+
|John |McC  |
+-----+-----+

Note: monotonically_increasing_id will generate increasing order numbers but not guaranteed to generate serial numbers starting from 0. 注意:monotonically_increasing_id将生成递增的订单号,但不能保证生成从0开始的序列号。

To sum up, filter() and select()/where() functions are two ways to select rows from a dataframe 总而言之, filter()select()/where()函数是从数据框中选择行的两种方法

I hope the answer is helpful 我希望答案是有帮助的

I can get the value with python using this: 我可以使用python获取值:

df_sample = yourDataFrame.select(collect_list("Col b").alias("a")) value = df_sample.select(col("a").getItem(1).alias("x")) display(value)

hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Databricks中阅读Azure CosmosDb Collection并写入Spark DataFrame - How to read Azure CosmosDb Collection in Databricks and write to a Spark DataFrame 从特定列和行获取值到变量 - Pandas DataFrame - get a value from a specific column and row to a variable - Pandas DataFrame 将 Spark Dataframe (pyspark.pandas.Dataframe) 从 Z3A580F142203676F53F 文件导出到 Excel 文件 - Export a Spark Dataframe (pyspark.pandas.Dataframe) to Excel file from Azure DataBricks 如何获取数据框每一行中特定值的列名 - How to get the column name for a specific values in every row of a dataframe 如何从Spark中的两个数据帧获取不匹配的列 - How to get unmatched column from two dataframe in spark 如何在 pandas dataframe 的特定行和列中插入输入值 - How to insert a value from an input in a specific row and column in a pandas dataframe 获取熊猫数据框中特定行和列的值 - get a value of a specific row and column in pandas dataframe 获取DataFrame中特定单元格的行列名 - Get Row and Column name of a specific cell in DataFrame 在 PySpark 中将 Spark DataFrame 从行移到列,并附加另一个 DataFrame - Transposing a Spark DataFrame from row to column in PySpark and appending it with another DataFrame Python spark:如何在 databricks 中使用 spark 并行化 Spark Dataframe 计算 - Python spark : How to parellelize Spark Dataframe compute using spark in databricks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM