[英]Inspect a string to create a new column in spark dataframe
I have a Spark dataframe column with trading pairs that I need to use to create a new column with the name of the coin populated in it.我有一个带有交易对的 Spark 数据框列,我需要使用它来创建一个新列,其中填充了硬币的名称。
The first column "bot" contains "Polkadot/USD", I need a new column called "coin" that contains only the substring "Polkadot" of the bot column.第一列“bot”包含“Polkadot/USD”,我需要一个名为“coin”的新列,它只包含bot列的子字符串“Polkadot”。 Same for all other rows.所有其他行相同。 Basically the new column needs to have the substring "/USD" removed.基本上新列需要删除子字符串“/USD”。
How would the code look like to accomplish this.代码如何实现这一点。 I'm a crypto trader not a coder, so the more coding detail in the answer the better.我是一名加密交易员而不是编码员,所以答案中的编码细节越多越好。 Thank you.谢谢你。
Note: The notebook is a Python Notebook注意:notebook 是 Python Notebook
You can use regexp_replace
to substitute a substring with another substring您可以使用regexp_replace
用另一个子字符串替换一个子字符串
df.withColumn('coin', F.regexp_replace(F.col('bot'), '/USD', ''))
Example例子
# sample dataframe
df3 = spark.createDataFrame([
('BamBridge/USD', ),
('CLV/USD', ),
('ETH/USD', ),
('Polkadot/USD', ),
], ['bot'])
df3 = df3.withColumn('coin', F.regexp_replace(F.col('bot'), '/USD', ''))
df3.show()
+-------------+---------+
| bot| coin|
+-------------+---------+
|BamBridge/USD|BamBridge|
| CLV/USD| CLV|
| ETH/USD| ETH|
| Polkadot/USD| Polkadot|
+-------------+---------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.