[英]How to convert a series object in to data frame using string cleaning
我有一個字符串對象系列,其中有一個特定的字符可以配合使用。 例如,以[]
結尾的字符將與以()
結尾的字符相對應。
s = pd.Series(['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
'December[jk]','anober(start)','secber(start)','Another(hkjl)'])
我可以簡單地清理數據,但是最后這些字符應該可以幫助我構建這樣的結果數據框
0 September firember hfh
1 September secmber
2 September Last day
3 October firober fhfh
4 October thber
5 October lasber
6 December anober
7 December secber
8 December Another
我認為這里沒有什么魔術,因此建議您在創建數據框之前自己解析列表:
import re
import pandas as pd
l = ['September[jk]', 'firember hfh(start)','secmber(end)','Last day(hjh)',
'October[jk]','firober fhfh (start)','thber(marg)','lasber(sth)',
'December[jk]','anober(start)','secber(start)','Another(hkjl)']
month = None
mylist = []
for i, el in enumerate(l):
m = re.match('(.*?)\[.*?\]', el)
if m:
month = m.groups()[0]
else:
m = re.match('(.*?)\(.*?\)', el)
if m:
mylist.append({'Month':month, 'Value':m.groups()[0]})
else:
print("Cannot find a match for {}".format(el))
df = pd.DataFrame(mylist)
print(df)
日期:
Month Value
0 September firember hfh
1 September secmber
2 September Last day
3 October firober fhfh
4 October thber
5 October lasber
6 December anober
7 December secber
8 December Another
旁注:我將re
庫用於regex,因為它可以適應許多更復雜的情況,但是在您的情況下,您可以使用in
和split
的內置函數:
for i, el in enumerate(l):
if '[' in el:
month = el.split('[')[0]
else:
if '(' in el:
mylist.append({'Month':month, 'Value':el.split('(')[0]})
else:
print("Cannot find a match for {}".format(el))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.