簡體   English   中英

Python 正則表達式無法找到模式 - 在 Apache Spark 上使用 pyspark

[英]Python regular expression unable to find pattern - using pyspark on Apache Spark

有人可以讓我為什么正則表達式

df = df2.withColumn("extracted", F.regexp_extract("title", "[Pp]ython", 0))

可以從名為 title 的后續列中找到模式“Python”或“python”

title
A fast PostgreSQL client library for Python: 3x faster than psycopg2
A project template for data science in Python
A simple python framework to build/train LUIS models
An Introduction to Stock Market Data Analysis with Python (Part 1)
Asynchronous Python
Cubr  A Rubiks Cube Solver Written in Python and using Webcam Input (2013)
Python 4 Kids: Python for Kids: Python 3  Project 10

但是正則表達式在下面找不到模式 Python 或 python

title
Python Core Development Sprint 2016: 3.6 and beyond
Hypothesis.works articles: 3.5.0 and 3.5.1 Releases of Hypothesis for Python
Total pip packages downloaded, separated by Python versions (June  August 2016)
PEP 530: Asynchronous Comprehensions in Python 3.6
Python 2.7 still reigns supreme in pip installs
CheckiO  games for Python and JavaScript coders. ClassRoom support is included
VR Zero, Virtual Reality on the RaspberryPi, in Python

謝謝

使用忽略大小寫正則表達式;

(?i) -ignore or case-insensitive mode ON

數據

數據=[

  (1,"Python Core Development Sprint 2016: 3.6 and beyond"),
  (2,"Hypothesis.works articles: 3.5.0 and 3.5.1 Releases of Hypothesis for Python"),
  (3,"CheckiO  games for python and JavaScript coders. ClassRoom support is included")
  ]
df=spark.createDataFrame(data, ['id','title'])
df.show(truncate=False)

解決方案

df.withColumn('extract', F.regexp_extract(col('title'),'(?i)[P]ython',0)).show()

結果

+---+--------------------+-------+
| id|               title|extract|
+---+--------------------+-------+
|  1|Python Core Devel...| Python|
|  2|Hypothesis.works ...| Python|
|  3|CheckiO  games fo...| python|
+---+--------------------+-------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM