簡體   English   中英

使用Python需要將數據從多列轉換為單列並重復列A.

[英]Using Python need to convert the data from multiple columns to single column and repeat column A

我仍然是python的新手,需要幫助:

我的數據是csv格式,如下所示:

Month YEAR      AZ-Phoenix  CA-Los Angeles  CA-San Diego    CA-San Francisco    CO-Denver   DC-Washington
    January 1987            59.33       54.67       46.61           50.20
    February 1987           59.65       54.89       46.87           49.96       64.77

這需要通過遞增第1列n次來合並並顯示在第2列和第3列中。

輸出應該是:

Month YEAR                           
    January 1987    AZ-Phoenix
    January 1987    CA-Los Angeles      59.33
    January 1987    CA-San Diego        54.67
    January 1987    CA-San Francisco    46.61
    January 1987    CO-Denver       50.20

如何在csv閱讀器中實現這一目標?

read_csv與分隔符tab一起使用 - \\t或者如果分隔符為2 and more whitespaces使用piRSquared's解決方案:

import pandas as pd

df = pd.read_csv(sep='\t') 

我想你需要:

df = df.set_index('YEAR').stack(dropna=False).reset_index()
df.columns = ['YEAR','A','B']
print (df)
             YEAR                 A      B
0    January 1987        AZ-Phoenix  59.33
1    January 1987    CA-Los Angeles  54.67
2    January 1987            CA-San  46.61
3    January 1987             Diego  50.20
4    January 1987  CA-San Francisco    NaN
5    January 1987         CO-Denver    NaN
6    January 1987     DC-Washington    NaN
7   February 1987        AZ-Phoenix  59.65
8   February 1987    CA-Los Angeles  54.89
9   February 1987            CA-San  46.87
10  February 1987             Diego  49.96
11  February 1987  CA-San Francisco  64.77
12  February 1987         CO-Denver    NaN
13  February 1987     DC-Washington    NaN

#if need remove rows with NaN
df = df.set_index('YEAR').stack().reset_index()
df.columns = ['YEAR','A','B']
print (df)
            YEAR                 A      B
0   January 1987        AZ-Phoenix  59.33
1   January 1987    CA-Los Angeles  54.67
2   January 1987            CA-San  46.61
3   January 1987             Diego  50.20
4  February 1987        AZ-Phoenix  59.65
5  February 1987    CA-Los Angeles  54.89
6  February 1987            CA-San  46.87
7  February 1987             Diego  49.96
8  February 1987  CA-San Francisco  64.77

melt另一種解決方案

df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A')
print (df)
             YEAR                 A      B
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987            CA-San  46.61
5   February 1987            CA-San  46.87
6    January 1987             Diego  50.20
7   February 1987             Diego  49.96
8    January 1987  CA-San Francisco    NaN
9   February 1987  CA-San Francisco  64.77
10   January 1987         CO-Denver    NaN
11  February 1987         CO-Denver    NaN
12   January 1987     DC-Washington    NaN
13  February 1987     DC-Washington    NaN


#if need remove rows with NaN
df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A').dropna(subset=['B'])
print (df)
            YEAR                 A      B
0   January 1987        AZ-Phoenix  59.33
1  February 1987        AZ-Phoenix  59.65
2   January 1987    CA-Los Angeles  54.67
3  February 1987    CA-Los Angeles  54.89
4   January 1987            CA-San  46.61
5  February 1987            CA-San  46.87
6   January 1987             Diego  50.20
7  February 1987             Diego  49.96
9  February 1987  CA-San Francisco  64.77

選項1
使用pd.melt

pd.melt(df, 'YEAR')

             YEAR          variable  value
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987      CA-San Diego  46.61
5   February 1987      CA-San Diego  46.87
6    January 1987  CA-San Francisco  50.20
7   February 1987  CA-San Francisco  49.96
8    January 1987         CO-Denver    NaN
9   February 1987         CO-Denver  64.77
10   January 1987     DC-Washington    NaN
11  February 1987     DC-Washington    NaN

選項2
numpy工具重建

pd.DataFrame(dict(
        YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
        B=df.drop('YEAR', 1).values.ravel(),
        A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
    ))[['YEAR', 'A', 'B']]


             YEAR          variable  value
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987      CA-San Diego  46.61
5   February 1987      CA-San Diego  46.87
6    January 1987  CA-San Francisco  50.20
7   February 1987  CA-San Francisco  49.96
8    January 1987         CO-Denver    NaN
9   February 1987         CO-Denver  64.77
10   January 1987     DC-Washington    NaN
11  February 1987     DC-Washington    NaN

建立

df = pd.read_csv(sep='\s{2,}', engine='python')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM