Pandas 系列：删除某个字符之前的所有内容，如果“所有内容”每次都更改

Question

I know questions like this one have been asked in abundance, but I haven't found one that answers mine (maybe I oversaw sth, but I gave it my best;) ).我知道这样的问题已经被大量提出，但我还没有找到一个可以回答我的问题（也许我监督了某事，但我尽了最大努力；））。 Here's the problem: I have a pandas series like this:这是问题所在：我有一个像这样的 pandas 系列：

ingredssplit
    0                          MAGERMILCH 65%
    1                                  Wasser
    2            Keks gemahlen 6% (WEIZENMEHL
    3                   Traubensaftkonzentrat
    4                                 Palmöl)
    5                                  Stärke
    6                              Maiskeimöl
    7                                  Zucker
    8     Antioxidationsmittel Ascorbinsäure¹
    9                  Thiamin (Vitamin B1). 
    dtype: object``

Now I want to remove everything in line 2 before the bracket.现在我想删除括号前第 2 行中的所有内容。 But this part changes everytime, sometimes it's "Keks gemahlen 6%", sometimes it's sth completly different.但这部分每次都在变化，有时是“Keks gemahlen 6%”，有时是完全不同的。 The only thing that is constant in line 2 before the "(" is the "%". So another possibility would be "abc de% (". How can I remove that part? My research brought me to the regular expressions operator and continuing, to this line:在“（”之前的第 2 行中唯一不变的是“%”。所以另一种可能性是“abc de% (”。我怎样才能删除那部分？我的研究把我带到了正则表达式运算符并继续，到这一行：

for line in ingredssplit:
print(re.sub())

But now I don't know how to fill the code bracket correctly, so everything is named before "(Weizenmehl". Maybe there's also another way? Also, how do I remove the superscript 1 at "Ascorbinsäure"? Thanks guys, have a nice we!但是现在我不知道如何正确填写代码括号，所以所有内容都在“（Weizenmehl”之前命名。也许还有另一种方式？另外，我如何删除“Ascorbinsäure”处的上标1？谢谢大家，有一个好我们！

Answer 1

Try str.extract :尝试str.extract ：

df.loc[[2], 'ingredssplit'] = (
    df.loc[[2], 'ingredssplit'].str.extract('.*\((.*)')[0]
)

Answer 2

Okay, I found a solution.好的，我找到了解决方案。 Thanks jcaliz, the '.*\( part was golden: This is what I did:谢谢 jcaliz， '.*\(部分是金色的：这就是我所做的：

   item1 = []
   for line in ingredssplit:
       line=re.sub('.*\(', '', line)
       item1.append(line)  
        
    def remove_punc(string):
        punc = '''!()-[]{};:'"\,<>./?@#$^&*_~'''
        for ele in string:  
            if ele in punc:  
                string = string.replace(ele, "") 
        return string
    lis = [remove_punc(i) for i in item1]
    lis = list(filter(None, lis))
    lis=[i.lstrip() for i in lis]
    lis=[i.rstrip() for i in lis]
    lis

This gives me a list:这给了我一个清单：

['MAGERMILCH 65%',
 'Wasser',
 'WEIZENMEHL',
 'Traubensaftkonzentrat',
 'Palmöl',
 'Stärke',
 'Maiskeimöl',
 'Zucker',
 'Antioxidationsmittel Ascorbinsäure¹',
 'Vitamin B1']

which I can easily transform into a dataframe eg:我可以轻松地将其转换为 dataframe 例如：

lis=pd.DataFrame(lis)
lis
                 0

0   MAGERMILCH 65%
1   Wasser
2   WEIZENMEHL
3   Traubensaftkonzentrat
4   Palmöl
5   Stärke
6   Maiskeimöl
7   Zucker
8   Antioxidationsmittel Ascorbinsäure¹
9   Vitamin B1

Thanks people: :)谢谢大家：：）

Pandas 系列：删除某个字符之前的所有内容，如果“所有内容”每次都更改

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-04-17 01:35:15

解决方案2
0 2021-04-17 11:32:48

Pandas 系列：删除某个字符之前的所有内容，如果“所有内容”每次都更改

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-04-17 01:35:15

解决方案2 0 2021-04-17 11:32:48

解决方案1
1 已采纳 2021-04-17 01:35:15

解决方案2
0 2021-04-17 11:32:48