简体   繁体   English

如果列不是所有时间戳,如何用字符串替换 dataframe 中的时间戳?

[英]How can I replace a timestamp in a dataframe with a string if the column is not all timestamps?

I am trying to build a machine learning model using an excel spreadsheet that cannot be edited.我正在尝试使用无法编辑的 excel 电子表格构建机器学习 model。 The a few of the columns in the.xls have formatting issues so some of the data is displayed as a datetime stamp instead of an str or int. .xls 中的一些列存在格式问题,因此某些数据显示为日期时间戳,而不是 str 或 int。 Here is an example from the pd dataframe:这是来自 pd dataframe 的示例:

0     40-49   premeno      15-19                  0-2       yes          3   
1     50-59      ge40      15-19                  0-2        no          1   
2     50-59      ge40      35-39                  0-2        no          2   
3     40-49   premeno      35-39                  0-2       yes          3   
4     40-49   premeno      30-34  **2019-05-03 00:00:00**       yes          2

In line 4, the value of 3-5 has been accidentally formatted as a date (shown as 03 May in the xls) and so is assigned as a datetime stamp in the dataframe.在第 4 行中,3-5 的值被意外格式化为日期(在 xls 中显示为 03 May),因此在 dataframe 中被指定为日期时间戳。 I have tried many methods to replace 2019-05-03 00:00:00 with 3-5 including:我尝试了很多方法来替换2019-05-03 00:00:003-5 ,包括:

df['column'] = df['column'].replace([('2019-05-03 00:00:00')], '3-5') 

and using Timestamp.replace but neither seem to work.并使用 Timestamp.replace 但似乎都不起作用。 Any ideas of how to replace this mis formatted data points with the correct data?关于如何用正确的数据替换这些格式错误的数据点的任何想法?

There might be a simpler way but you may need to apply re.search with positive lookarounds.可能有更简单的方法,但您可能需要应用re.search和积极的环视。

import re

pat1 = '(?<=\*{2}\d{4}-0\d-0)(\d)(?= 00:00:00\*\*)'

pat2 = '(?<=\*{2}\d{4}-0)(\d)(?=-0\d 00:00:00\*\*)'

df['column'] = df['column'].astype(str).apply(
        lambda x: (re.search(pat2, '**2019-05-03 00:00:00**').group()
                   +'-'+re.search(pat1, '**2019-05-03 00:00:00**').group())
                   if '**' in x else x
     )

You can iterate the column with an apply and check if the element is an instance of pd.Timestamp ;您可以使用apply迭代列并检查元素是否是pd.Timestamp的实例; if so, extract a string "day-month", otherwise leave as it is.如果是,则提取一个字符串“day-month”,否则保持原样。

Ex:前任:

import pandas as pd

# what you have is something like (mixed datatype column/Series)
df = pd.DataFrame({'label': ['0-2', '1-3', pd.Timestamp('2019-05-03')]})

# iterate the column with an apply, extract day-month string if pd.Timestamp
df['label1'] = df['label'].apply(lambda x: f"{x.day}-{x.month}" if isinstance(x, pd.Timestamp) else x)

# ... to get
df['label1'] 
0    0-2
1    1-3
2    3-5
Name: label1, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas DataFrame:按Timestamp列进行过滤,并带有字符串时间戳列表 - Python Pandas DataFrame: filter by a Timestamp column with a list of string timestamps 如何在熊猫数据框的一列中用零替换所有“”值 - How can I replace all the " " values with Zero's in a column of a pandas dataframe 使用 Pandas Dataframe,如何拆分特定列中的字符串,然后用拆分的第一个索引替换该字符串? - Using a Pandas Dataframe, how can I split the strings in a specific column and then replace that string with the first index of the split? 熊猫:如何将dataframe列中的“ timestamp”值从object / str转换为timestamp? - Pandas: How can I convert 'timestamp' values in my dataframe column from object/str to timestamp? 如何用列平均值替换数据框元素? - How can I replace dataframe elements with their column average? 如何迭代替换数据框中特定列的值? - How can I iteratively replace values of a specific column in a dataframe? 如何将列表的 for 循环替换为 dataframe 列 - How can I replace for loop of list in to dataframe column 如何替换数据框python中的所有单词 - How can I replace all words in a dataframe python 如何替换 dataframe 列中的特定最后一个字符串 - How to replace specific last string in a dataframe column 如何通过字符串列表替换 dataframe 列中的字符串 - How to replace string in dataframe column by list of strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM