简体   繁体   中英

python convert column format with numeric

I have a dataframe df

name
project111
project212
project'09
projectws22

trying to replace the column that not numeric after 'project' to none, the output is as below:

name
project111
project212
none
none
none

if those missing values are empty strings???, which Pandas doesn't recognise as null. To fix this, you can convert the empty stings (or whatever is in your empty cells) to np.nan objects using replace(), and then call dropna()on your DataFrame to delete rows with null tenants.

Solved using regular expression.

import pandas as pd
import re

df=pd.DataFrame(['project1','project2','project.3','projectws22'])
df.columns =['project']
for i in range(df.shape[0]):
   value = df.iloc[i,0]
   if re.search('project\d+',value):
      df.iloc[i,0]=re.search('project\d+',value).group(0)
   else:
      df.iloc[i,0]='none'

input

df
Out[33]: 
       project
0     project1
1     project2
2    project.3
3  projectws22

output

df
Out[31]: 
    project
0  project1
1  project2
2      none
3      none

There are a few ways to approach this. The most direct being to use a regular expression to match the columns that only contain numbers. Then invert the resulting boolean array and overwrite those values:

df.loc[~df["name"].str.fullmatch(r"project\d+"), "name"] = "none"
print(df)
         name
0  project111
1  project212
2        none
3        none

step by step breakdown:

# within the name column, find values that start with "project" 
#  and continue to any number of digits all the way to the end of the string.
>>> matches = df["name"].str.match(r"project\d+") 

# matches is a boolean array, indicating whether or not our pattern matches
#  the value in each cell of the column.
#  since we want to replace the values that DONT match the pattern, we use
#  the tilde operator to invert the boolean array (swap True's and False's
>>> ~matches

# Finally we replace those values with "none" by selecting the rows from the "name"
#  column that do not match the aforementioned pattern. Overwrite those cells with "none"
>>> df.loc[~matches, "name"] = "none"

An approach using list comprehension

Code:

df['name'] = [df['name'][i] if (len(x)>1 and x[1].isdigit()) else None for i, x in enumerate(df['name'].str.split('project').values)]

Output:

         name
0  project111
1  project212
2        None
3        None
4        None

Explanation:

  1. df['name'].str.split('project').values splits each value in column names where string project is present and returns a list
  2. enumerate each list to get the row index and result of split operation on that row
  3. check if split result has more than one element, to avoid empty rows
  4. check if string after project ie second element in split-result is a digit
  5. if yes return original value in row else return None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM