简体   繁体   English

基于正则表达式的熊猫数据框条件

[英]pandas dataframe condition based on regex expression

   TTT
1. 802010001-999-00000285-888-
2. 256788
3. 1940
4. NaN
5. NaN
6. 702010001-X-2YZ-00000285-888-

I want to Fill column GGT column with all other values except for the amounts我想用除金额以外的所有其他值填充列 GGT 列

Required table would be like this所需的表是这样的

   TTT                                GGT
1. 802010001-999-00000285-888-        802010001-999-00000285-888-
2. 256788                             NaN
3. 1940                               NaN
4. NaN                                NaN
5. NaN                                NaN
6. 702010001-X-2YZ-00000285-888-      702010001-X-2YZ-00000285-888-

the orginal table has more than 200thousands rows.原始表有超过 20 万行。

If you want to remove the rows with only numbers, you can use the match() method of the string elements of the column TTT.如果要删除只有数字的行,可以使用 TTT 列的字符串元素的match()方法。 You can use a code like that :您可以使用这样的代码:

df["GGT"] = df["TTT"][df["TTT"].str.match(r'^(\d)+$')==False]

Use Series.mask :使用Series.mask

df['GGT'] = df['TTT'].mask(pd.to_numeric(df['TTT'], errors='coerce').notna())

Or:或者:

df['GGT'] = df['TTT'].mask(df["TTT"].astype(str).str.contains('^\d+$', na=True))
print (df)
                             TTT                            GGT
0    802010001-999-00000285-888-    802010001-999-00000285-888-
1                         256788                            NaN
2                           1940                            NaN
3                            NaN                            NaN
4  702010001-X-2YZ-00000285-888-  702010001-X-2YZ-00000285-888-

I

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM