簡體   English   中英

python 字符串操作不使用 pandas

[英]python string manipulation without using pandas

如何在不使用 pandas package 的情況下使用 python 來操作此數據集。 我可以使用 pandas 來做到這一點,但這是一個新的字符串操作,我不知道該怎么做

    text = """
        series_id                       year    period         value    footnote_codes

        LASST180000000000003            1971    M01          6.6    R

        LASST180000000000003            1971    M02          6.6    R

        LASST180000000000003            1977    M03          6.5    R

        LASST180000000000003            1976    M04          6.3    R

        LASST180000000000003            1978    M05          6.0    R

        LASST180000000000003            1979    M06          5.8    R

        LASST180000000000003            1976    M07          5.7    R

        """

##### do not use pandas ####

### 1. replace the footnote_codes column by the month_year column
# holds a string that has the month year combination.  For example, if a row has 
# the month at 06 and the year at 2007, 
# this column should have the following string: “06_2007”     
# ### 2. only keep the data from 1976 to 1979
   

讓數據更容易處理的一種方法是使用.split() function,它會對你有很大幫助。

splited_text = text.split() # remove all the \n and spaces
print([i.split() for i in text.split("\n\n")]) 
# split(\n\n) make a list of each row and then 
# you can split the rows to have all the items

我不確定您要查找的確切內容,但此代碼將使用字典,鍵是列標題,值是列中值的列表。

它還創建腳注列。

text = """
    series_id                       year    period         value    footnote_codes

    LASST180000000000003            1971    M01          6.6    R

    LASST180000000000003            1971    M02          6.6    R

    LASST180000000000003            1977    M03          6.5    R

    LASST180000000000003            1976    M04          6.3    R

    LASST180000000000003            1978    M05          6.0    R

    LASST180000000000003            1979    M06          5.8    R

    LASST180000000000003            1976    M07          5.7    R

    """

values = text.split()

headers = values[0:5]

columns = {col_name:[value for value in values[idx+5::5]] for idx, col_name in  enumerate(headers[:-1])}

columns['footnotes'] = [period[1:]+'_'+year for year, period in zip(columns['year'], columns['period'])]

print(columns)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM