简体   繁体   English

Python/Pandas:当一列数据满足一定条件时如何处理

[英]Python/Pandas:How to process a column of data when it meets certain criteria

i have a csv lie this我有一个 csv 谎言这个

userlabel|country
SZ5GZTD_[56][13631808]|russia
YZ5GZTC-3_[51][13680735]|uk
XZ5GZTA_12-[51][13574893]|usa
testYZ5GZWC_11-[51][13632101]|cuba

I use pandas to read this csv, I'd like to add a new column ci ,Its value comes from userlabel ,and the following conditions must be met:我使用pandas读取这个 csv,我想添加一个新列ci ,其值来自userlabel ,必须满足以下条件:

  1. convert values to lowercase将值转换为小写
  2. start with 'yz' or 'testyz'以“yz”或“testyz”开头

the code is like this:代码是这样的:

(df['userlabel'].str.lower()).str.extract(r"(test)?([a-z]+).*", expand=True)[1]

when it matched, ci is the number between the first "- or _" and second "- or _" from userlabel .当它匹配时, ciuserlabel中第一个“- or _”和第二个“- or _”之间的数字。

the fake code is like this:假代码是这样的:

ci = (userlabel,r'.*(\_|\-)(\d+)(\_|\-).*',2)

finally,the result is like this最后,结果是这样的

userlabel                      ci country
SZ5GZTD_[56][13631808]            russia
YZ5GZTC-3_[51][13680735]       3  uk
XZ5GZTA_12-[51][13574893]         usa
testYZ5GZWC_11-[51][13632101]  11 cuba
import re

def get_val(s):
    l = re.findall(r'^(YZ|testYZ).*[_-](\d+)[_-].*', s)
    return  None if(len(l) == 0) else l[0][1]

df['ci'] = df['userlabel'].apply(lambda x: get_val(x))
df = df[['userlabel', 'ci', 'country']]
userlabel                         ci    country
0   SZ5GZTD_[56][13631808]        None  russia
1   YZ5GZTC-3_[51][13680735]      3     uk
2   XZ5GZTA_12-[51][13574893]     None  usa
3   testYZ5GZWC_11-[51][13632101] 11    cuba

You can use您可以使用

import pandas as pd
df = pd.DataFrame({'userlabel':['SZ5GZTD_[56][13631808]','YZ5GZTC-3_[51][13680735]','XZ5GZTA_12-[51][13574893]','testYZ5GZWC_11-[51][13632101]'], 'country':['russia','uk','usa','cuba']})
df['ci'] = df['userlabel'].str.extract(r"(?i)^(?:yz|testyz)[^_-]*[_-](\d+)[-_]", expand=True)
>>> df['ci']
0    NaN
1      3
2    NaN
3     11
Name: ci, dtype: object
# To rearrange columns, add the following line:
df = df[['userlabel', 'ci', 'country']]
>>> df
                       userlabel   ci country
0         SZ5GZTD_[56][13631808]  NaN  russia
1       YZ5GZTC-3_[51][13680735]    3      uk
2      XZ5GZTA_12-[51][13574893]  NaN     usa
3  testYZ5GZWC_11-[51][13632101]   11    cuba

See the regex demo .请参阅正则表达式演示

Regex details :正则表达式详细信息

  • (?i) - make the pattern case insensitive (no need using str.lower() ) (?i) - 使模式不区分大小写(无需使用str.lower()
  • ^ - start of string ^ - 字符串的开头
  • (?:yz|testyz) - a non-capturing group matching either yz or testyz (?:yz|testyz) - 匹配yztestyz的非捕获组
  • [^_-]* - zero or more chars other than _ and - [^_-]* - 除_-之外的零个或多个字符
  • [_-] - the first _ or - [_-] - 第一个_-
  • (\d+) - Group 1 (the Series.str.extract requires a capturing group since it only returns this captured substring): one or more digits (\d+) - 第 1 组( Series.str.extract需要一个捕获组,因为它只返回这个捕获的子字符串):一位或多位数字
  • [-_] - a - or _ . [-_] - 一个-_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查 pandas 列中的语法是否符合某些条件 - check if syntax in pandas column meets certain criteria 当行元素符合条件时查找列名 - Find column names when row element meets a criteria Pandas 如何查找 Pandas 中每一行的哪一列首先满足条件? - How to find which column meets a criteria first for each row in Pandas? 在 Python 中验证数字序列是否满足某些条件 - Validate That A Sequence of Numbers Meets Certain Criteria in Python Python Pandas-获得符合标准的第一个价值 - Python Pandas - Get the First Value that Meets Criteria Python Pandas:如果groupby中任何前面的行中的值满足特定条件,则从数据框中删除一行 - Python Pandas: Eliminate a row from a dataframe if a value in a any preceding row in a groupby meets a certain criteria 如何计算python(pandas)中某一列中的数据? - how to count data in a certain column in python(pandas)? Pandas:如何循环遍历长字符串的每个字符,如果字符满足特定类型和数值条件则执行加法 - Pandas: How to loop through each character of a long string, perform addition if the character meets certain type and numerical criteria 如何用特定标准替换 Python pandas 中的文本? - How to replace text in Python pandas with certain criteria? 如果其他列符合条件,则将操作应用于“熊猫”列 - Applying Operation to Pandas column if other column meets criteria
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM