I have a dataframe df
name |
---|
project111 |
project212 |
project'09 |
projectws22 |
trying to replace the column that not numeric after 'project' to none, the output is as below:
name |
---|
project111 |
project212 |
none |
none |
none |
if those missing values are empty strings???, which Pandas doesn't recognise as null. To fix this, you can convert the empty stings (or whatever is in your empty cells) to np.nan objects using replace(), and then call dropna()on your DataFrame to delete rows with null tenants.
Solved using regular expression.
import pandas as pd
import re
df=pd.DataFrame(['project1','project2','project.3','projectws22'])
df.columns =['project']
for i in range(df.shape[0]):
value = df.iloc[i,0]
if re.search('project\d+',value):
df.iloc[i,0]=re.search('project\d+',value).group(0)
else:
df.iloc[i,0]='none'
input
df
Out[33]:
project
0 project1
1 project2
2 project.3
3 projectws22
output
df
Out[31]:
project
0 project1
1 project2
2 none
3 none
There are a few ways to approach this. The most direct being to use a regular expression to match the columns that only contain numbers. Then invert the resulting boolean array and overwrite those values:
df.loc[~df["name"].str.fullmatch(r"project\d+"), "name"] = "none"
print(df)
name
0 project111
1 project212
2 none
3 none
step by step breakdown:
# within the name column, find values that start with "project"
# and continue to any number of digits all the way to the end of the string.
>>> matches = df["name"].str.match(r"project\d+")
# matches is a boolean array, indicating whether or not our pattern matches
# the value in each cell of the column.
# since we want to replace the values that DONT match the pattern, we use
# the tilde operator to invert the boolean array (swap True's and False's
>>> ~matches
# Finally we replace those values with "none" by selecting the rows from the "name"
# column that do not match the aforementioned pattern. Overwrite those cells with "none"
>>> df.loc[~matches, "name"] = "none"
An approach using list comprehension
Code:
df['name'] = [df['name'][i] if (len(x)>1 and x[1].isdigit()) else None for i, x in enumerate(df['name'].str.split('project').values)]
Output:
name
0 project111
1 project212
2 None
3 None
4 None
Explanation:
df['name'].str.split('project').values
splits each value in column names where string
project is present and returns a list
list
to get the row index and result of split
operation on that rowsplit
result has more than one element, to avoid empty rowsstring
after project ie second element in split-result is a digitNone
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.