简体   繁体   English

在熊猫数据框中的某些字符之前提取非数字字符

[英]Extract non-digit characters before certain character in pandas dataframe

I have a pandas dataframe that looks like this: 我有一个看起来像这样的熊猫数据框:

> row   extract_column
> 0 412952266-desiredtext1»randtext-irrelevant
> 1 512952766-desiredtext1»randtext-irrelevant
> 2 212952766-desiredtext1»randtext-irrelevant
> 3 112953066-desiredtext1»randtext-irrelevant
> 4 712953066-desiredtext1»randtext-irrelevant
> 5 612953366-desiredtext1»randtext-irrelevant
> 6 912953366-desiredtext1»randtext-irrelevant
> 7 412954866-desiredtext1»randtext-irrelevant
> 8 312954966-desiredtext1»randtext-irrelevant
> 9 212954966-desiredtext1»randtext-irrelevant
> 10    612955866-desiredtext1»randtext-irrelevant
> 11    912256266-desiredtext1»randtext-irrelevant
> 12    812256366-desiredtext1»randtext-irrelevant
> 13    512256566-desiredtext1»randtext-irrelevant
> 14    412256566-desiredtext1»randtext-irrelevant
> 15    312256566-desiredtext1»randtext-irrelevant
> 16    212256566-desiredtext1»randtext-irrelevant
> 17    612256566-desiredtext1»randtext-irrelevant
> 18    812956666-desiredtext2»randtext-irrelevant
> 19    912957166-desiredtext2»randtext-irrelevant
> 20    012957866-desiredtext2»randtext-irrelevant
> 21    12952966-desiredtext2»randtext-irrelevant
> 22    2012953066-desiredtext2»randtext-irrelevant
> 23    012953066-desiredtext2»randtext-irrelevant
> 24    312953066-desiredtext2»randtext-irrelevant
> 25    112254166-desiredtext2»randtext-irrelevant
> 26    712254166-desiredtext2»randtext-irrelevant

I want to get the desiredtext1, desiredtext2 fields from extract_column. 我想从extract_column获取desiredtext1,desiredtext2字段。 The desired data is always followed by the » symbol and preceded by 9 digits followed by a dash. 所需的数据始终后跟»符号,并在前跟9个数字和一个破折号。

尝试extract

df.extract_column.str.extract(r'-([^\.]*)\»', expand=False)
df.extract_column.str.extract('-(\\w+)')
Out[100]: 
               0
0   desiredtext1
1   desiredtext1
2   desiredtext1
3   desiredtext1
4   desiredtext1
5   desiredtext1
6   desiredtext1
7   desiredtext1
8   desiredtext1
9   desiredtext1
10  desiredtext1
11  desiredtext1
12  desiredtext1
13  desiredtext1
14  desiredtext1
15  desiredtext1
16  desiredtext1
17  desiredtext1
18  desiredtext2
19  desiredtext2
20  desiredtext2
21  desiredtext2
22  desiredtext2
23  desiredtext2
24  desiredtext2
25  desiredtext2
26  desiredtext2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame:删除非数字字符后的所有内容 - Pandas DataFrame: Remove everything after a non-digit character 当前面有数字时递归匹配每个非数字字符 - Recursively matching each non-digit characters when preceded by a digit Pyparsing:带有至少一个非数字字符的单词 - Pyparsing: word with at least one non-digit character 在Python中使用.find查找第一个非数字字符 - Finding first non-digit character with .find in Python 非数字字符的fail2ban模式与数字输入匹配 - fail2ban pattern for non-digit character matches input with digit 删除Python中第一个非数字之后(包括第一个非数字)的所有非正则表达式的最佳方法 - Best non-regex way to remove all characters after and including the first non-digit in Python python中的正则表达式:匹配两个单词之间的任何非数字字符 - regex in python: match any non-digit character between two words 如何告诉python 3跳过csv文件中的非数字字符 - how to tell python 3 to skip over non-digit characters from a csv file Pandas:删除 dataframe 列中特定字符之前的所有字符 - Pandas: Remove all characters before a specific character in a dataframe column 如何在某些字符之前替换熊猫数据框中的字符串 - How to replace string on pandas dataframe before certain characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM