简体   繁体   English

数据帧pandas中使用逗号的多个值

[英]more than one value with comma in dataframe pandas

colum 1, colum2 a,b,c 30 b,c,f 40 a,g,z 50 . . . Using above dataframe with col1,2 I'd like to have dataframe as below dataframe with col3, 4. Additionally, col1 consists of values with commas. 使用带有col1,2的上述数据帧,我希望将数据帧与col3,4一起放在数据帧下面。此外,col1由带逗号的值组成。 col4 consists of sum of col2 following col3. col4由col3之后的col2之和组成。 column3, column4 a 80 b 70 c 70 f 40 g 50 z 50

Use: 使用:

df = (df.set_index('colum2')['colum1']
        .str.split(',', expand=True)
        .stack()
        .reset_index(name='column3')
        .groupby('column3', as_index=False)['colum2']
        .sum()
        .rename(columns={'colum2':'column4'})
      )
print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation : 说明

  1. First set_index by column colum2 第一个set_index按列colum2
  2. Create DataFrame by split 通过split创建DataFrame
  3. Reshape by stack 通过stack重塑
  4. Create index by columns by reset_index reset_index按列创建索引
  5. groupby and aggregate sum groupbysum
  6. Last rename column if necessary 如有必要,上次重命名列

Another solution: 另一种方案:

from itertools import chain

a = df['colum1'].str.split(',')
lens = a.str.len()

df = pd.DataFrame({
    'column3' : list(chain.from_iterable(a)), 
    'column4' : df['colum2'].repeat(lens)
}).groupby('column3', as_index=False)['column4'].sum()

print (df)
  column3  column4
0       a       80
1       b       70
2       c       70
3       f       40
4       g       50
5       z       50

Explanation : 说明

  1. Create lists by split 通过split创建列表
  2. Get lengths of lsits by len 通过len获取lsits的长度
  3. Last repeat columns and flatten colum1 最后repeat列并展平colum1
  4. groupby and aggregate sum groupbysum

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用pandas数据框中的多个行或列值进行计算 - using more than one row or column value in a pandas dataframe for a calculation Python - Pandas - 在 Dataframe 中插入多个值到一列 - Python - Pandas - in Dataframe insert more than one value to a column 如何在 select 行中有一个以上的值 Pandas DataFrame - How to select rows with more than one value in Pandas DataFrame 具有多于一列的 Pandas 数据框的差异 - Diff on pandas dataframe with more than one column 将多个 function 应用于 pandas dataframe - Applying more than one function to a pandas dataframe 在熊猫数据框中的一个列中有多个值时如何计算值计数 - how to calculate value counts when we have more than one value in a colum in pandas dataframe 熊猫:Apply():返回多个值 - Pandas: Apply(): Return more than one value 如何只保留pandas DataFrame中具有多个值的行? - How to only keep rows which have more than one value in a pandas DataFrame? “具有多个元素的数组的真值是模糊的” - 搜索大熊猫数据帧的NaN - “The truth value of an array with more than one element is ambiguous” - searching pandas dataframe for NaNs 如何在熊猫数据框中找到与另一列中的多个值相对应的列中具有值的所有行? - How can I find all rows with a value in one column which corresponds to more than one value in another column in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM