[英]Pandas: shift values from one column to other, and drop duplicates using python
Suppose I am getting a dataframe like this:假设我得到一个像这样的 dataframe:
Name value
Umicore 470
889
19
912
1.68
Shopify 19
500
17
51
1.44
How do get a dataframe such that I will be left with this output如何获得 dataframe 以便我将留下这个 output
Name value
Umicore 1.68
Shopify 1.44
This is how I am getting my dataframe:这就是我得到 dataframe 的方式:
#my_df['Name'].replace('', np.nan, inplace=True)
#my_df['Name'].replace('', np.nan).ffill(inplace=True) #tried just now fails
#my_df['value'].replace('', np.nan, inplace=True)
#my_df.dropna(subset=['Name', 'value'], inplace=True)
my_df.drop_duplicates(keep='last', inplace=True)
my_df.to_csv('output.csv', index=False)
How do I shift the numbers from Name column to Value column?如何将数字从名称列转移到值列? Please help!请帮忙!
Use:利用:
#if empty strings or NaNs in Name column remove these rows
df['Name'] = df['Name'].replace('', np.nan)
df = df.dropna(subset=['Name'])
#create default index
df = df.reset_index(drop=True)
print (df)
Name value
0 Umicore 470.0
1 889 NaN
2 19 NaN
3 912 NaN
4 1.68 NaN
5 Shopify 19.0
6 500 NaN
7 17 NaN
8 51 NaN
9 1.44 NaN
#convert values to numeric, if no numeric is NaN created
s = pd.to_numeric(df['Name'], errors='coerce')
#forward filling missing values by non numeric values
df['Name'] = df['Name'].where(s.isna()).ffill()
#set value by s
df['value'] = s
print (df)
Name value
0 Umicore NaN
1 Umicore 889.00
2 Umicore 19.00
3 Umicore 912.00
4 Umicore 1.68 <- last value of Umicore
5 Shopify NaN
6 Shopify 500.00
7 Shopify 17.00
8 Shopify 51.00
9 Shopify 1.44
#remove duplicates by Name column
df = df.drop_duplicates('Name',keep='last')
print (df)
Name value
4 Umicore 1.68
9 Shopify 1.44
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.