如何使用字符串清理将系列对象转换为数据框

Question

I have a series object of strings where there is a specific characters i can go along with. 我有一个字符串对象系列，其中有一个特定的字符可以配合使用。 For instance, the one with the end character of [] will be corresponded to those with end character of () 例如，以[]结尾的字符将与以()结尾的字符相对应。

s = pd.Series(['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
              'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
              'December[jk]','anober(start)','secber(start)','Another(hkjl)'])

I can simply clean the data but these characters at the end should help me build the resulting data frame like this 我可以简单地清理数据，但是最后这些字符应该可以帮助我构建这样的结果数据框

0   September   firember hfh
1   September   secmber
2   September  Last day
3    October   firober fhfh
4    October     thber
5    October    lasber
6   December    anober
7   December    secber
8   December   Another

Answer 1

I don't think there's any magic here, so I recommend parsing the list yourself before creating the dataframe: 我认为这里没有什么魔术，因此建议您在创建数据框之前自己解析列表：

import re
import pandas as pd

l = ['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
              'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
              'December[jk]','anober(start)','secber(start)','Another(hkjl)']

month = None
mylist = []
for i, el in enumerate(l):
    m = re.match('(.*?)\[.*?\]', el)
    if m:
        month = m.groups()[0]
    else:
        m = re.match('(.*?)\(.*?\)', el)
        if m:
            mylist.append({'Month':month, 'Value':m.groups()[0]})
        else:
            print("Cannot find a match for {}".format(el))

df = pd.DataFrame(mylist)
print(df)

Out: 日期：

       Month          Value
0  September   firember hfh
1  September        secmber
2  September       Last day
3    October  firober fhfh 
4    October          thber
5    October         lasber
6   December         anober
7   December         secber
8   December        Another

Side note: I used the re library for regex because it could be adapted to many more complex situations, but in your case you could just use the built-in functions, with in and split : 旁注：我将re库用于regex，因为它可以适应许多更复杂的情况，但是在您的情况下，您可以使用in和split的内置函数：

for i, el in enumerate(l):
    if '[' in el:
        month = el.split('[')[0]
    else:
        if '(' in el:
            mylist.append({'Month':month, 'Value':el.split('(')[0]})
        else:
            print("Cannot find a match for {}".format(el))

如何使用字符串清理将系列对象转换为数据框

问题描述

1 个解决方案

解决方案1
0 2016-12-08 23:56:19

如何使用字符串清理将系列对象转换为数据框

问题描述

1 个解决方案

解决方案1 0 2016-12-08 23:56:19

解决方案1
0 2016-12-08 23:56:19