简体   繁体   English

将数据从特定列移动到新行中的不同列

[英]Moving data from specific columns to a different column in a new row

I have a csv containing details of documents located on Google Drive.我有一个包含 Google Drive 上文档详细信息的 csv。 I am trying to make it easier to read and deal with as this example has over 400 columns.我试图让它更容易阅读和处理,因为这个例子有 400 多列。

Each row in the csv represents a file on Google drive. csv 中的每一行代表 Google 驱动器上的一个文件。 There are multiple columns to denote who owns the file and who it is shared with.有多个列来表示谁拥有该文件以及与谁共享该文件。

Every time a file has been shared, the details of the person it has been shared with are appended as a new column to that row.每次共享文件时,共享对象的详细信息都会作为新列附加到该行。

I have loaded the data into Pandas data frame and I'm struggling to move the contents of certain columns to a new row.我已将数据加载到 Pandas 数据框中,并且正在努力将某些列的内容移动到新行。

Below is an example下面是一个例子


Input:
owner  |  id  |  title | permissions.0.name | permissions.0.email | permsissions.1.name | permissions.1.email
value     1      doc1    Tommy                tommy@office.com      Timmy                 timmy@office.com
value     2      doc2    Tommy                tommy@office.com
value     3      doc3    Timmy                timmy@office.com
value     4      doc4    Tammy                tammy@office.com      Tommy                 tommy@office.com

Output:
owner  |  id  |  title | permissions.0.name | permissions.0.email 
value     1      doc1    Tommy                tommy@office.com      
value     2      doc2    Tommy                tommy@office.com
value     3      doc3    Timmy                timmy@office.com
value     4      doc4    Tammy                tammy@office.com      
value     5      doc1    Timmy                timmy@office.com
value     6      doc4    Tommy                tommy@office.com

I began by creating a list of and finding out the maximum number in the column headings (it is 46 in the full data).我首先创建一个列表并找出列标题中的最大数字(在完整数据中为 46)。 Then loop through from 1 to 46 building the column name to look at and moving the contents from that column to a different column on a new row.然后从 1 到 46 循环构建列名称以查看并将内容从该列移动到新行上的不同列。 But I had no idea how to move the contents...但我不知道如何移动内容......

import pandas

df = pandas.read_csv(input.csv)

cols = list(df) #list of column names

maxcol =[]

for c in cols:
    if '.' in c:
        n = c.split('.')[1]
        maxcol.append(int(n))

maxval = max(maxcol)

for i in range(1 to maxval):
    colname = 'permissions.' + str(i) + '.name'
    # move contents from this column to permissions.0.name in new row somehow

There are many more columns (over 400) and do not appear in an organised structure.还有更多的列(超过 400 个)并且没有出现在有组织的结构中。 For example columns are created when required.例如,在需要时创建列。 So we have columns like this:所以我们有这样的列:

permissions.5.email | permissions.1.withPhoto | permissions.6.name | permission.6.email

You can use this code.您可以使用此代码。 I think there are easier ways to do this, but it seems to be giving the required result:我认为有更简单的方法可以做到这一点,但它似乎给出了所需的结果:

cols = list(df)
# Get the variable column names
per_cols = [c for c in cols if '.' in c]
# Get the constant column names, i.e. title
main_cols = [c for c in cols if '.' not in c]
# Get the set of numbers present in columns
col_no = set([c.split('.')[1] for c in per_cols])

result = pd.DataFrame()
for col in col_no:
    # Get the list of columns with the current number
    part = [c for c in per_cols if c.split('.')[1] == col]
    # Add the constant columns to get a complete df
    part =  main_cols + part
    temp = df[part]
    # Change the name of columns to '0' for unified result
    temp.columns = [c.replace(col, '0') for c in temp.columns]
    # Drop the NaN rows (preferrably use a subset you are certain wouldn't be null)
    temp = temp.dropna()
    # Append the chunck of df to result
    # If the chunck has a new column, it will be added to result df
    result = result.append(temp)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 移动行值包含特定字符串到 Python 中的新列 - Moving row values contains specific string to new column in Python 如何从不同列的行中在数据框中创建新列? - How do I make new columns in dataframe from a row of a different column? 根据不同列的 if 条件创建新列 - creating a new column on if condition from different columns 基于来自两个不同列Python Pandas的特定字符串信息的新列 - new column based on specific string info from two different columns Python Pandas 按特定列将不同行的列合并为一个行组 - merge the columns of different rows into one row group by a specific column 基于来自不同数据帧的两列(不同长度)创建新列 - Create new column based on two columns (different length) from different data frames 根据另一列从两个不同列中获取数据的 if 语句创建新列 - Creating new column based on if statement of another column grabbing data from yet two different columns 基于另一列将数据从行移动到列 - Moving data from rows to columns based on another column 我想用 df 中其他 3 个特定列的非空数据创建一个新列 - I want to create a new column with the not null data from other 3 specific columns in my df 创建新列,其中来自两个不同数据帧的两列相同 - Create new column where two columns from two different data frames are the same
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM