I have a dataframe column with matching strings how can I replace them with similar one which I see first in column? (I tried Levenshtein distance and fuzzywuzzy but only getting ratios it's not replacing the values).
Key Value
1 A
1 AA
1 A,AAB
1 AAB
2 B
2 BA
Output should be
Key Value
1 A
1 A
1 A
1 A
2 B
2 B
Everytime I am getting same result as input.
Extract the first alphanumeric character using regex.
df=df.withColumn('New_Value',regexp_extract(col('Value'), '(^[\w])', 1))
+---+-----+---------+
|Key|Value|New_Value|
+---+-----+---------+
| 1| A| A|
| 1| AA| A|
| 1|A,AAB| A|
| 1| AAB| A|
| 2| B| B|
| 2| BA| B|
+---+-----+---------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.