简体   繁体   English

如何将数据从一列移动到另一列?

[英]How to move data from one column to another?

I have following data:我有以下数据:

            id                 date   oked_1     oked_2  KPS  address    type
225  001041004832         2000-10-12  71209     01111  105  196430100    3
225  001041004832         2000-10-12  71209     46211  105  196430100    3
225  001041004832         2000-10-12  71209     52101  105  196430100    3

i need to move "oked_2" to "oked_1" in a way that all other columns have to replicated.我需要以所有其他列都必须复制的方式将“oked_2”移动到“oked_1”。 For example, below you can see how oked_2 values are copied to oked_1, while other column data are the same.比如下面你可以看到 oked_2 的值是如何复制到 oked_1 的,而其他列数据是一样的。 I want to have only oked_1 for my final dataframe(all oked_2 data have to be moved to oked_1).I expect:我只想将 oked_1 用于我的最终数据帧(所有 oked_2 数据都必须移动到 oked_1)。我希望:

               id                 date   oked_1     oked_2  KPS  address    type
    225  001041004832         2000-10-12  71209     01111  105  196430100    3
    225  001041004832         2000-10-12  01111     46211  105  196430100    3
    225  001041004832         2000-10-12  46211     52101  105  196430100    3
    225  001041004832         2000-10-12  52101     52101  105  196430100    3

How can I do that?我怎样才能做到这一点? I have not tried, because I do not have any clue how to process it...我没有尝试过,因为我不知道如何处理它......

If you see the expected dataframe, you can clearly notice that values from oked_2 are copied to oked_1.如果您看到预期的数据帧,您可以清楚地注意到 oked_2 中的值被复制到 oked_1。 Furthermore, because one row was added because there was 3 different values in oked_2 and one was in oked_1.此外,因为在 oked_2 中有 3 个不同的值而添加了一行,而在 oked_1 中有一个。 Total 4 unique values.共有 4 个唯一值。

You can try this:你可以试试这个:

import pandas as pd

df=pd.DataFrame({"oked_1":["71209","71209","71209"],"oked_2":["01111","46211","52101"]})

print(df)
"""
   oked_1 oked_2
0  71209  01111
1  71209  46211
2  71209  52101
"""
df.loc[len(df.index)] = df.loc[len(df.index)-1] 
  
df["aa"]=pd.unique(df[["oked_1","oked_2"]].values.ravel('K'))
print(df)
"""
  oked_1  oked_2
0  71209  01111
1  01111  46211
2  46211  52101
3  52101  52101
"""

I don't think I have completely understood you logic but it is giving expected result, as I understand.我认为我没有完全理解你的逻辑,但据我所知,它给出了预期的结果。

Edit: I have tested it with this dataset:编辑:我已经用这个数据集测试过它:

id,date,oked_1,oked_2,KPS,address,type
001041004832,2000-10-12,71209,01111,105,196430100,3
001041004832,2000-10-12,71209,46211,105,196430100,3
001041004832,2000-10-12,71209,52101,105,196430100,3

And the output is:输出是:

           id        date  oked_1  oked_2  KPS    address  type
0  1041004832  2000-10-12   71209    1111  105  196430100     3
1  1041004832  2000-10-12   71209   46211  105  196430100     3
2  1041004832  2000-10-12   71209   52101  105  196430100     3

           id        date  oked_1  oked_2  KPS    address  type
0  1041004832  2000-10-12   71209    1111  105  196430100     3
1  1041004832  2000-10-12    1111   46211  105  196430100     3
2  1041004832  2000-10-12   46211   52101  105  196430100     3
3  1041004832  2000-10-12   52101   52101  105  196430100     3

And it is working as expected!它按预期工作!

from io import StringIO
import pandas as pd

data = """
_            id                 date   oked_1     oked_2  KPS  address    type
225  001041004832         2000-10-12  71209     01111  105  196430100    3
225  001041004832         2000-10-12  71209     46211  105  196430100    3
225  001041004832         2000-10-12  71209     52101  105  196430100    3
"""

df = pd.read_csv(StringIO(data), dtype=str, delim_whitespace=True)

df['oked_1'] = df[['oked_1', 'oked_2']].to_numpy().tolist()

df = (df.explode('oked_1')
        .drop_duplicates('oked_1', ignore_index=True)
        .drop('oked_2', axis=1)
     )

Output for df : df输出:

     _            id        date oked_1  KPS    address type
0  225  001041004832  2000-10-12  71209  105  196430100    3
1  225  001041004832  2000-10-12  01111  105  196430100    3
2  225  001041004832  2000-10-12  46211  105  196430100    3
3  225  001041004832  2000-10-12  52101  105  196430100    3

You can create separate data frames for oked_1 and oked_2 and then drop duplicates & combine the dataframe.您可以为 oked_1 和 oked_2 创建单独的数据框,然后删除重复项并合并数据框。 As shown below.如下所示。

df = pd.read_csv(filepath, dtype = str) #this is your main dataframe

df1 = df.drop(columns = ['oked_2']).drop_duplicates(subset=['oked_1'])
df2 = df.drop(columns = ['oked_1']).drop_duplicates(subset=['oked_2']).rename(columns = {'oked_2': 'oked_1'})

data = pd.concat([df1,df2]).reset_index()
print(data)

which looks like this看起来像这样

   index          id        date oked_1  KPS    address type
0      0  1041004832  2000-10-12  71209  105  196430100    3
1      0  1041004832  2000-10-12  01111  105  196430100    3
2      1  1041004832  2000-10-12  46211  105  196430100    3
3      2  1041004832  2000-10-12  52101  105  196430100    3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM