简体   繁体   English

带有正则表达式的 Python df.loc

[英]Python df.loc with regex

Dateframe that I am changing values of rows based on conditions.我根据条件更改行值的日期框。

Current Dataframe:当前数据框:

import pandas as pd
import re
data = [['ACK_ID','TEXT',30],
        ['TOT_ACTIVE_PARTCP_CNT','NUMERIC'],
        ['ADMIN_SIGNED_DATE', "TEXT", 30],
        ['BENEF_RCVG_BNFT_CNT','NUMERIC'],
        ['SPONS_SIGNED_DATE','TEXT',30]]
df = pd.DataFrame(data, columns=['FIELD_NAME', 'TYPE','SIZE (only for text fields)'])

#Change all "NUMERIC" to "FLOAT" in ['TYPE'] column.
df.loc[df["TYPE"] == "NUMERIC", "TYPE"] = "FLOAT"

I also want to change all ['TYPE'] rows that have 'DATE' within their ['FIELD_NAME'] entry.我还想更改在其 ['FIELD_NAME'] 条目中具有 'DATE' 的所有 ['TYPE'] 行。 I want to use regex to capture 'DATE'.我想使用正则表达式来捕获“日期”。

Code attempt with regex:使用正则表达式的代码尝试:

df.loc[df["FIELD_NAME"] == r'^.*DATE+$', "TYPE"] = "DATE"

This code does not change the dataframe at all.此代码根本不会更改数据框。

The desired output is:所需的输出是:

data = [['ACK_ID','TEXT',30],
        ['TOT_ACTIVE_PARTCP_CNT','FLOAT'],
        ['ADMIN_SIGNED_DATE', "DATE", 30],
        ['BENEF_RCVG_BNFT_CNT','FLOAT'],
        ['SPONS_SIGNED_DATE','DATE',30]]
df = pd.DataFrame(data, columns=['FIELD_NAME', 'TYPE','SIZE (only for text fields)'])

You can use simple .str.contains :您可以使用简单的.str.contains

df.loc[df["FIELD_NAME"].str.contains("DATE"), "TYPE"] = "DATE"
print(df)

Prints:印刷:

              FIELD_NAME   TYPE  SIZE (only for text fields)
0                 ACK_ID   TEXT                         30.0
1  TOT_ACTIVE_PARTCP_CNT  FLOAT                          NaN
2      ADMIN_SIGNED_DATE   DATE                         30.0
3    BENEF_RCVG_BNFT_CNT  FLOAT                          NaN
4      SPONS_SIGNED_DATE   DATE                         30.0

you can use str.contains with a regex expression.您可以将str.contains与正则表达式一起使用。

df.loc[df['FIELD_NAME'].str.contains(r'^.*DATE+$'), 'TYPE'] = 'DATE'
print(df)

              FIELD_NAME   TYPE  SIZE (only for text fields)
0                 ACK_ID   TEXT                         30.0
1  TOT_ACTIVE_PARTCP_CNT  FLOAT                          NaN
2      ADMIN_SIGNED_DATE   DATE                         30.0
3    BENEF_RCVG_BNFT_CNT  FLOAT                          NaN
4      SPONS_SIGNED_DATE   DATE                         30.0

If Date is always at the end you could also just use str.endswith :如果 Date 总是在最后,你也可以只使用str.endswith

df.loc[df['FIELD_NAME'].str.endswith('DATE'), 'TYPE'] = 'DATE'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM