将数据从特定列移动到新行中的不同列

Question

I have a csv containing details of documents located on Google Drive.我有一个包含 Google Drive 上文档详细信息的 csv。 I am trying to make it easier to read and deal with as this example has over 400 columns.我试图让它更容易阅读和处理，因为这个例子有 400 多列。

Each row in the csv represents a file on Google drive. csv 中的每一行代表 Google 驱动器上的一个文件。 There are multiple columns to denote who owns the file and who it is shared with.有多个列来表示谁拥有该文件以及与谁共享该文件。

Every time a file has been shared, the details of the person it has been shared with are appended as a new column to that row.每次共享文件时，共享对象的详细信息都会作为新列附加到该行。

I have loaded the data into Pandas data frame and I'm struggling to move the contents of certain columns to a new row.我已将数据加载到 Pandas 数据框中，并且正在努力将某些列的内容移动到新行。

Below is an example下面是一个例子


Input:
owner  |  id  |  title | permissions.0.name | permissions.0.email | permsissions.1.name | permissions.1.email
value     1      doc1    Tommy                tommy@office.com      Timmy                 timmy@office.com
value     2      doc2    Tommy                tommy@office.com
value     3      doc3    Timmy                timmy@office.com
value     4      doc4    Tammy                tammy@office.com      Tommy                 tommy@office.com

Output:
owner  |  id  |  title | permissions.0.name | permissions.0.email 
value     1      doc1    Tommy                tommy@office.com      
value     2      doc2    Tommy                tommy@office.com
value     3      doc3    Timmy                timmy@office.com
value     4      doc4    Tammy                tammy@office.com      
value     5      doc1    Timmy                timmy@office.com
value     6      doc4    Tommy                tommy@office.com

I began by creating a list of and finding out the maximum number in the column headings (it is 46 in the full data).我首先创建一个列表并找出列标题中的最大数字（在完整数据中为 46）。 Then loop through from 1 to 46 building the column name to look at and moving the contents from that column to a different column on a new row.然后从 1 到 46 循环构建列名称以查看并将内容从该列移动到新行上的不同列。 But I had no idea how to move the contents...但我不知道如何移动内容......

import pandas

df = pandas.read_csv(input.csv)

cols = list(df) #list of column names

maxcol =[]

for c in cols:
    if '.' in c:
        n = c.split('.')[1]
        maxcol.append(int(n))

maxval = max(maxcol)

for i in range(1 to maxval):
    colname = 'permissions.' + str(i) + '.name'
    # move contents from this column to permissions.0.name in new row somehow

There are many more columns (over 400) and do not appear in an organised structure.还有更多的列（超过 400 个）并且没有出现在有组织的结构中。 For example columns are created when required.例如，在需要时创建列。 So we have columns like this:所以我们有这样的列：

permissions.5.email | permissions.1.withPhoto | permissions.6.name | permission.6.email

Answer 1

You can use this code.您可以使用此代码。 I think there are easier ways to do this, but it seems to be giving the required result:我认为有更简单的方法可以做到这一点，但它似乎给出了所需的结果：

cols = list(df)
# Get the variable column names
per_cols = [c for c in cols if '.' in c]
# Get the constant column names, i.e. title
main_cols = [c for c in cols if '.' not in c]
# Get the set of numbers present in columns
col_no = set([c.split('.')[1] for c in per_cols])

result = pd.DataFrame()
for col in col_no:
    # Get the list of columns with the current number
    part = [c for c in per_cols if c.split('.')[1] == col]
    # Add the constant columns to get a complete df
    part =  main_cols + part
    temp = df[part]
    # Change the name of columns to '0' for unified result
    temp.columns = [c.replace(col, '0') for c in temp.columns]
    # Drop the NaN rows (preferrably use a subset you are certain wouldn't be null)
    temp = temp.dropna()
    # Append the chunck of df to result
    # If the chunck has a new column, it will be added to result df
    result = result.append(temp)

将数据从特定列移动到新行中的不同列

问题描述

1 个解决方案

解决方案1
0 2020-10-15 17:52:54

将数据从特定列移动到新行中的不同列

问题描述

1 个解决方案

解决方案1 0 2020-10-15 17:52:54

解决方案1
0 2020-10-15 17:52:54