繁体   English   中英

如何将熊猫数据框的一列中的文本拆分为所需的格式(数据框的三列)

[英]How to split text in one column of pandas data frame to required format(three columns of dataframe )

给定数据,我需要分为三列,分别是NameDateType IN data frames

数据:

ANNAPOLIS INDUSTRIAL LOAN CO - Aug-2002 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties

PERRY & CO - Apr-2016 - Non-Procurable Miscellaneous Non-Procurable Royalties Royalties

ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor

L-3 COMMUNICATIONS TITAN CORP - Dec-2014 - Store Construction General Contractor General Requirements Final Site Clean Up

AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-NON-QUAL STK O - Jun-2002 - Store Construction Fixtures Store Fixtures Store Fixtures

ASSOCIATED BANC-CORP - Jun-2008 - Corporate Services Human Resources Contingent Labor/Temp Labor Contingent Labor/Temp Labor

AETNA VARIABLE FUND - Apr-2002 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)


FAIRCHILD CORP - Nov-2001 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission

CALIFORNIA REAL ESTATE INVESTMENT TRUST - Mar-2013 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)

EDO CORP - Jul-2008 - Store Management Real Estate Real Estate Services Real Estate General (Search, Appraisal, Realtor Commission)

如何使用regex在三个单独的列中转换数据。

我才刚刚开始学习正则表达式-所以我不知道如何继续完成正则表达式。

您可以使用以下模式轻松找到日期: -\\s([AZ][az]{2}-[0-9]{4})\\s-

然后,您只需要从日期模式中选择句子的开头和结尾即可获得namestypes

这里的代码(使用re模块):

# Import module
import re
# Read file
with open("temp.txt") as f:
    text = f.read()

# Apply regex rules
names = re.findall(r"(.*?)-\s[A-Z][a-z]{2}-[0-9]{4}\s-", text)
dates = re.findall(r"-\s([A-Z][a-z]{2}-[0-9]{4})\s-", text)
types = re.findall(r"-\s[A-Z][a-z]{2}-[0-9]{4}\s-([^\n]*)", text)

# Create dataframes
df = pd.DataFrame({"Name": names,
                    "Date": dates,
                    "Type": types})

print(df)
#                                                Name      Date                                               Type
# 0                      ANNAPOLIS INDUSTRIAL LOAN CO   Aug-2002   Non-Procurable Miscellaneous Non-Procurable R...
# 1                                        PERRY & CO   Apr-2016   Non-Procurable Miscellaneous Non-Procurable R...
# 2                              ASSOCIATED BANC-CORP   Jun-2008   Corporate Services Human Resources Contingent...
# 3                     L-3 COMMUNICATIONS TITAN CORP   Dec-2014   Store Construction General Contractor General...
# 4  AMERACE CORP 1967 QUAL STK OPT PL & 1972 QUAL-...  Jun-2002   Store Construction Fixtures Store Fixtures St...
# 5                              ASSOCIATED BANC-CORP   Jun-2008   Corporate Services Human Resources Contingent...
# 6                               AETNA VARIABLE FUND   Apr-2002   Store Management Real Estate Real Estate Serv...
# 7                                    FAIRCHILD CORP   Nov-2001   Store Management Real Estate Real Estate Serv...
# 8           CALIFORNIA REAL ESTATE INVESTMENT TRUST   Mar-2013   Store Management Real Estate Real Estate Serv...
# 9                                          EDO CORP   Jul-2008   Store Management Real Estate Real Estate Serv...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM