python 字符串操作不使用 pandas

Question

如何在不使用 pandas package 的情況下使用 python 來操作此數據集。 我可以使用 pandas 來做到這一點，但這是一個新的字符串操作，我不知道該怎么做

    text = """
        series_id                       year    period         value    footnote_codes

        LASST180000000000003            1971    M01          6.6    R

        LASST180000000000003            1971    M02          6.6    R

        LASST180000000000003            1977    M03          6.5    R

        LASST180000000000003            1976    M04          6.3    R

        LASST180000000000003            1978    M05          6.0    R

        LASST180000000000003            1979    M06          5.8    R

        LASST180000000000003            1976    M07          5.7    R

        """

##### do not use pandas ####

### 1. replace the footnote_codes column by the month_year column
# holds a string that has the month year combination.  For example, if a row has 
# the month at 06 and the year at 2007, 
# this column should have the following string: “06_2007”     
# ### 2. only keep the data from 1976 to 1979

Answer 1

讓數據更容易處理的一種方法是使用.split() function，它會對你有很大幫助。

splited_text = text.split() # remove all the \n and spaces
print([i.split() for i in text.split("\n\n")]) 
# split(\n\n) make a list of each row and then 
# you can split the rows to have all the items

Answer 2

我不確定您要查找的確切內容，但此代碼將使用字典，鍵是列標題，值是列中值的列表。

它還創建腳注列。

text = """
    series_id                       year    period         value    footnote_codes

    LASST180000000000003            1971    M01          6.6    R

    LASST180000000000003            1971    M02          6.6    R

    LASST180000000000003            1977    M03          6.5    R

    LASST180000000000003            1976    M04          6.3    R

    LASST180000000000003            1978    M05          6.0    R

    LASST180000000000003            1979    M06          5.8    R

    LASST180000000000003            1976    M07          5.7    R

    """

values = text.split()

headers = values[0:5]

columns = {col_name:[value for value in values[idx+5::5]] for idx, col_name in  enumerate(headers[:-1])}

columns['footnotes'] = [period[1:]+'_'+year for year, period in zip(columns['year'], columns['period'])]

print(columns)

python 字符串操作不使用 pandas

問題描述

2 個解決方案

解決方案1
0 2021-03-20 15:22:22

解決方案2
0 2021-03-20 16:01:13

python 字符串操作不使用 pandas

問題描述

2 個解決方案

解決方案1 0 2021-03-20 15:22:22

解決方案2 0 2021-03-20 16:01:13

解決方案1
0 2021-03-20 15:22:22

解決方案2
0 2021-03-20 16:01:13