python convert column format with numeric

Question

I have a dataframe df

name
project111
project212

project'09
projectws22

trying to replace the column that not numeric after 'project' to none, the output is as below:

name
project111
project212
none
none
none

Answer 1

if those missing values are empty strings???, which Pandas doesn't recognise as null. To fix this, you can convert the empty stings (or whatever is in your empty cells) to np.nan objects using replace(), and then call dropna()on your DataFrame to delete rows with null tenants.

Answer 2

Solved using regular expression.

import pandas as pd
import re

df=pd.DataFrame(['project1','project2','project.3','projectws22'])
df.columns =['project']
for i in range(df.shape[0]):
   value = df.iloc[i,0]
   if re.search('project\d+',value):
      df.iloc[i,0]=re.search('project\d+',value).group(0)
   else:
      df.iloc[i,0]='none'

input

df
Out[33]: 
       project
0     project1
1     project2
2    project.3
3  projectws22

output

df
Out[31]: 
    project
0  project1
1  project2
2      none
3      none

Answer 3

There are a few ways to approach this. The most direct being to use a regular expression to match the columns that only contain numbers. Then invert the resulting boolean array and overwrite those values:

df.loc[~df["name"].str.fullmatch(r"project\d+"), "name"] = "none"
print(df)
         name
0  project111
1  project212
2        none
3        none

step by step breakdown:

# within the name column, find values that start with "project" 
#  and continue to any number of digits all the way to the end of the string.
>>> matches = df["name"].str.match(r"project\d+") 

# matches is a boolean array, indicating whether or not our pattern matches
#  the value in each cell of the column.
#  since we want to replace the values that DONT match the pattern, we use
#  the tilde operator to invert the boolean array (swap True's and False's
>>> ~matches

# Finally we replace those values with "none" by selecting the rows from the "name"
#  column that do not match the aforementioned pattern. Overwrite those cells with "none"
>>> df.loc[~matches, "name"] = "none"

Answer 4

An approach using list comprehension

Code:

df['name'] = [df['name'][i] if (len(x)>1 and x[1].isdigit()) else None for i, x in enumerate(df['name'].str.split('project').values)]

Output:

         name
0  project111
1  project212
2        None
3        None
4        None

Explanation:

df['name'].str.split('project').values splits each value in column names where string project is present and returns a list
enumerate each list to get the row index and result of split operation on that row
check if split result has more than one element, to avoid empty rows
check if string after project ie second element in split-result is a digit
if yes return original value in row else return None

python convert column format with numeric

Question

4 answers

solution1
0 2021-03-22 18:34:42

solution2
0 2021-03-22 18:44:28

solution3
0 2021-03-22 18:46:01

solution4
0 2021-03-22 18:56:04

python convert column format with numeric

Question

4 answers

solution1 0 2021-03-22 18:34:42

solution2 0 2021-03-22 18:44:28

solution3 0 2021-03-22 18:46:01

solution4 0 2021-03-22 18:56:04

solution1
0 2021-03-22 18:34:42

solution2
0 2021-03-22 18:44:28

solution3
0 2021-03-22 18:46:01

solution4
0 2021-03-22 18:56:04