简体   繁体   中英

fill column with value of a column from another dataframe, depending on conditions

I have a dataframe that looks like this (my input database on COVID cases)

data:

    date      state  cases
0   20200625  NY     300
1   20200625  CA     250
2   20200625  TX     200
3   20200625  FL     100
5   20200624  NY     290
6   20200624  CA     240
7   20200624  TX     100
8   20200624  FL     80
...

worth noting that the "date" column in the above data is a number (not datetime)

I want to make it a timeseries like this (desired output), with dates as index and each state's COVID cases as columns

          NY     CA     TX     FL
20200625  300    250    200    100
20200626  290    240    100    80
...

As of now I managed to create only the scheleton of the output with the following code

states = ['NY', 'CA', 'TX', 'FL']
days = [20200625, 20200626]

columns = states
positives = pd.DataFrame(columns = columns)

i = 0
for day in days:
   positives.loc[i, "date"] = day
   i = i +1

positives.set_index('date', inplace=True)
positives= positives.rename_axis(None)
print(positives)

which returns:

             NY   CA   TX   FL
20200625.0  NaN  NaN  NaN  NaN
20200626.0  NaN  NaN  NaN  NaN

how can I get from the "data" dataframe the value of column "cases" when:

(i) value in data["state"] = column header of "positives",

(ii) value in data["date"] = row index of "positives"

You can do:

df = df.set_index(['date', 'state']).unstack().reset_index()

# fix column names
df.columns = df.columns.get_level_values(1)

state               CA     FL     NY     TX
0      20200624  240.0    NaN  290.0    NaN
1      20200625  250.0  100.0  300.0  200.0

Later, to set index again we need to set the name explicitly, do:

df = df.set_index("")
df.index.name = "date"

The transformation you are interested in is called a pivot. You can achieve this in Pandas as follows:

# Reproduce part of the data
data = pd.DataFrame({'date': [20200625, 20200625, 20200624, 20200624], 
                     'state': ['NY', 'CA', 'NY', 'CA'], 
                     'cases': [300, 250, 290, 240]})
data

#        date state  cases
# 0  20200625    NY    300
# 1  20200625    CA    250
# 2  20200624    NY    290
# 3  20200624    CA    240

# Pivot
data.pivot(index='date', columns='state', values='cases')

# state      CA   NY
# date              
# 20200624  240  290
# 20200625  250  300

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM