简体   繁体   中英

How to get only the word from a string in python?

I am new to pandas, I have an issue with strings. So I have a string s = "'hi'+'bikes'-'cars'>=20+'rangers'" I want only the words from the string, not the symbols or the integers. How can I do it?

My input:

s = "'hi'+'bikes'-'cars'>=20+'rangers'"

Excepted Output:

s = "'hi','bikes','cars','rangers'"

try this using regex

s = "'hi'+'bikes'-'cars'>=20+'rangers'"
samp= re.compile('[a-zA-z]+')
word= samp.findall(s)

not sure about pandas, but you can also do it with Regex as well, and here is the solution

import re


s = "'hi'+'bikes'-'cars'>=20+'rangers'"
words = re.findall("(\'.+?\')", s)
output = ','.join(words)

print(output)

For pandas I would convert the column in the dataframe to string first:

df
                                   a  b
0  'hi'+'bikes'-'cars'>=20+'rangers'  1
1      random_string 'with'+random,#  4
2             more,weird/stuff=wrong  6

df["a"] = df["a"].astype("string")

 df["a"]
0    'hi'+'bikes'-'cars'>=20+'rangers'
1        random_string 'with'+random,#
2               more,weird/stuff=wrong
Name: a, dtype: string

Now you can see that dtype is string, which means you can do string operations on it, including translate and split ( pandas strings ). But first you have to make a translate table with punctuation and digits imported from string module string docs

from string import digits, punctuation

Then make a dictionary mapping each of the digits and punctuation to whitespace

from itertools import chain
t = {k: " " for k in chain(punctuation, digits)}

create the translation table using str.maketrans (no import necessary with python 3.8 but may be a bit different with other versions) and apply the translate and split (with "str" in between) to the column)

t = str.maketrans(t)

df["a"] = df["a"].str.translate(t).str.split()
df
                                a  b
0      [hi, bikes, cars, rangers]  1
1  [random, string, with, random]  4
2     [more, weird, stuff, wrong]  6

As you can see you only have the words now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM