检查字符串以在火花数据框中创建新列

Question

I have a Spark dataframe column with trading pairs that I need to use to create a new column with the name of the coin populated in it.我有一个带有交易对的 Spark 数据框列，我需要使用它来创建一个新列，其中填充了硬币的名称。

The first column "bot" contains "Polkadot/USD", I need a new column called "coin" that contains only the substring "Polkadot" of the bot column.第一列“bot”包含“Polkadot/USD”，我需要一个名为“coin”的新列，它只包含bot列的子字符串“Polkadot”。 Same for all other rows.所有其他行相同。 Basically the new column needs to have the substring "/USD" removed.基本上新列需要删除子字符串“/USD”。

How would the code look like to accomplish this.代码如何实现这一点。 I'm a crypto trader not a coder, so the more coding detail in the answer the better.我是一名加密交易员而不是编码员，所以答案中的编码细节越多越好。 Thank you.谢谢你。

Note: The notebook is a Python Notebook注意：notebook 是 Python Notebook

Answer 1

You can use regexp_replace to substitute a substring with another substring您可以使用regexp_replace用另一个子字符串替换一个子字符串

df.withColumn('coin', F.regexp_replace(F.col('bot'), '/USD', ''))

Example例子

# sample dataframe
df3 = spark.createDataFrame([
    ('BamBridge/USD', ),
    ('CLV/USD', ),
    ('ETH/USD', ),
    ('Polkadot/USD', ),
], ['bot'])

df3 = df3.withColumn('coin', F.regexp_replace(F.col('bot'), '/USD', ''))

df3.show()

+-------------+---------+
|          bot|     coin|
+-------------+---------+
|BamBridge/USD|BamBridge|
|      CLV/USD|      CLV|
|      ETH/USD|      ETH|
| Polkadot/USD| Polkadot|
+-------------+---------+

检查字符串以在火花数据框中创建新列

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-07-26 07:47:45

检查字符串以在火花数据框中创建新列

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-07-26 07:47:45

解决方案1
1 已采纳 2021-07-26 07:47:45