简体   繁体   English

根据条件替换 pandas dataframe 列中的 int 或字符串的一部分

[英]replace part of an int or string in a pandas dataframe column upon condition

I have a pandas dataframe with a column representing dates but saved in int format.我有一个 pandas dataframe,其中有一列表示日期但以 int 格式保存。 For several dates I have a 13th and a 14th month.对于几个日期,我有第 13 个月和第 14 个月。 I would like to replace these 13th and 14th months by the 12th month.我想用第 12 个月替换第 13 个月和第 14 个月。 And then, eventually transform it into date_time format.然后,最终将其转换为 date_time 格式。

Original_date
20190101
20191301
20191401

New_date
20190101
20191201
20191201

I tried by replacing the format into string then replace only based on the index of the months in the string [4:6], but it didn't work out:我尝试将格式替换为字符串,然后仅根据字符串 [4:6] 中的月份索引进行替换,但没有成功:

df.original_date.astype(str)
for string in df['original_date']:
    if string[4:6]=="13" or string[4:6]=="14":
        string.replace(string, string[:4]+ "12" + string[6:])
print(df['original_date'])

You can use .str.replace with regex您可以将.str.replace与正则表达式一起使用

df['New_date'] = df['Original_date'].astype(str).str.replace('(\d{4})(13|14)(\d{2})', r'\g<1>12\3', regex=True)
print(df)

   Original_date  New_date
0       20190101  20190101
1       20191301  20191201
2       20191401  20191201

Why not just write a regular expression?为什么不直接写一个正则表达式呢?

s = pd.Series('''20190101
20191301
20191401'''.split('\n')).astype(str)
s.str.replace('(?<=\d{4})(13|14)(?=01)', '12', regex=True)

Yielding:产量:

0    20190101
1    20191201
2    20191201
dtype: object

(Nb you will need to reassign the output back to a column to persist it in memory.) (注意,您需要将 output 重新分配回列以将其保留在 memory 中。)

You can write the replace and logic in a seperate function, which also gives you the option to adapt it easily if you also need to change the year or month.您可以在单独的 function 中编写替换和逻辑,如果您还需要更改年份或月份,这还可以让您轻松调整它。 apply lets you use that function on each row of the DataFrame. apply允许您在 DataFrame 的每一行上使用 function。

import pandas as pd

def split_and_replace(x):
    year = x[0:4]
    month = x[4:6]
    day = x[6:8]
    if month in ('13', '14'):
        month = '12'
    else:
        pass
    
    return year + month + day
    

df = pd.DataFrame(
    data={
        'Original_date': ['20190101', '20191301', '20191401']    
    }
)

res = df.Original_date.apply(lambda x: split_and_replace(x))

print(res)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM