![](/img/trans.png)
[英]python faster way to perform string manipulation without pandas apply?
[英]python string manipulation without using pandas
如何在不使用 pandas package 的情況下使用 python 來操作此數據集。 我可以使用 pandas 來做到這一點,但這是一個新的字符串操作,我不知道該怎么做
text = """
series_id year period value footnote_codes
LASST180000000000003 1971 M01 6.6 R
LASST180000000000003 1971 M02 6.6 R
LASST180000000000003 1977 M03 6.5 R
LASST180000000000003 1976 M04 6.3 R
LASST180000000000003 1978 M05 6.0 R
LASST180000000000003 1979 M06 5.8 R
LASST180000000000003 1976 M07 5.7 R
"""
##### do not use pandas ####
### 1. replace the footnote_codes column by the month_year column
# holds a string that has the month year combination. For example, if a row has
# the month at 06 and the year at 2007,
# this column should have the following string: “06_2007”
# ### 2. only keep the data from 1976 to 1979
讓數據更容易處理的一種方法是使用.split() function,它會對你有很大幫助。
splited_text = text.split() # remove all the \n and spaces
print([i.split() for i in text.split("\n\n")])
# split(\n\n) make a list of each row and then
# you can split the rows to have all the items
我不確定您要查找的確切內容,但此代碼將使用字典,鍵是列標題,值是列中值的列表。
它還創建腳注列。
text = """
series_id year period value footnote_codes
LASST180000000000003 1971 M01 6.6 R
LASST180000000000003 1971 M02 6.6 R
LASST180000000000003 1977 M03 6.5 R
LASST180000000000003 1976 M04 6.3 R
LASST180000000000003 1978 M05 6.0 R
LASST180000000000003 1979 M06 5.8 R
LASST180000000000003 1976 M07 5.7 R
"""
values = text.split()
headers = values[0:5]
columns = {col_name:[value for value in values[idx+5::5]] for idx, col_name in enumerate(headers[:-1])}
columns['footnotes'] = [period[1:]+'_'+year for year, period in zip(columns['year'], columns['period'])]
print(columns)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.