![](/img/trans.png)
[英]How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe
[英]How to split text in one column of pandas data frame to required format(three columns of dataframe )
给定数据,我需要分为三列,分别是Name
, Date
, Type
IN data frames
:
数据:
ANNAPOLIS INDUSTRIAL LOAN CO - Aug-2002 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties
PERRY & CO - Apr-2016 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties
ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor
L-3 COMMUNICATIONS TITAN CORP - Dec-2014 - Store Construction General Contractor General Requirements Final Site Clean Up
AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-NON-QUAL STK O - Jun-2002 - Store Construction Fixtures Store Fixtures Store Fixtures
ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor
AETNA VARIABLE FUND - Apr-2002 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)
FAIRCHILD CORP - Nov-2001 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission
CALIFORNIA REAL ESTATE INVESTMENT TRUST - Mar-2013 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)
EDO CORP - Jul-2008 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)
如何使用regex在三个单独的列中转换数据。
我才刚刚开始学习正则表达式-所以我不知道如何继续完成正则表达式。
您可以使用以下模式轻松找到日期: -\\s([AZ][az]{2}-[0-9]{4})\\s-
然后,您只需要从日期模式中选择句子的开头和结尾即可获得names
和types
。
这里的代码(使用re
模块):
# Import module
import re
# Read file
with open("temp.txt") as f:
text = f.read()
# Apply regex rules
names = re.findall(r"(.*?)-\s[A-Z][a-z]{2}-[0-9]{4}\s-", text)
dates = re.findall(r"-\s([A-Z][a-z]{2}-[0-9]{4})\s-", text)
types = re.findall(r"-\s[A-Z][a-z]{2}-[0-9]{4}\s-([^\n]*)", text)
# Create dataframes
df = pd.DataFrame({"Name": names,
"Date": dates,
"Type": types})
print(df)
# Name Date Type
# 0 ANNAPOLIS INDUSTRIAL LOAN CO Aug-2002 Non-Procurable Miscellaneous Non-Procurable R...
# 1 PERRY & CO Apr-2016 Non-Procurable Miscellaneous Non-Procurable R...
# 2 ASSOCIATED BANC-CORP Jun-2008 Corporate Services Human Resources Contingent...
# 3 L-3 COMMUNICATIONS TITAN CORP Dec-2014 Store Construction General Contractor General...
# 4 AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-... Jun-2002 Store Construction Fixtures Store Fixtures St...
# 5 ASSOCIATED BANC-CORP Jun-2008 Corporate Services Human Resources Contingent...
# 6 AETNA VARIABLE FUND Apr-2002 Store Management Real Estate Real Estate Serv...
# 7 FAIRCHILD CORP Nov-2001 Store Management Real Estate Real Estate Serv...
# 8 CALIFORNIA REAL ESTATE INVESTMENT TRUST Mar-2013 Store Management Real Estate Real Estate Serv...
# 9 EDO CORP Jul-2008 Store Management Real Estate Real Estate Serv...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.