简体   繁体   中英

python pandas repeat cells from column and concatenate

I Have a Dataframe like this :

***Out[112]:*** 
  Cell Name   Site  Cell Count
0    04301A  04301           3
1    04301B  04301           3
2    04301C  04301           3
3    04302A  04302           3
4    04302B  04302           3
5    04302C  04302           3
6    04303A  04303           2
7    04303B  04303           2
8    04304A  04304           3

and I want to repeat the 'Cell Name' Column by the 'Cell Count' Column then concatenate the each cell name with other cell names in the same Site so the output will be like this :

***Out[119]:*** 
  Repeated Cells   Site  Cell-Neighbor
0         04301A  04301  04301A-04301B
1         04301A  04301  04301A-04301C
2         04301B  04301  04301B-04301A
3         04301B  04301  04301B-04301C
4         04301C  04301  04301C-04301A
5         04301C  04301  04301C-04301B
6         04302A  04302  04302A-04302B
7         04302A  04302  04302A-04302C
8         04302B  04302  04302B-04302A
9         04302B  04302  04302B-04302C

I managed to repeat the cells and put theme in a new DataFrame using the following line

repeated_cells = df_gcell['Cell Name'].repeat(df_gcell['Cell Count'] - 1).values

I subtracted (1) from the count because I don't need the cell to be concatenated with it self,

my problem now is how to import other cells from the same Site and concatenate them with the cell!!

  • looking at your input and output it's clear you require permutations with a Site
  • to simplify a little, target column names I have defined without spaces in them
  • your data fro Site 04304 is inconsistent so gets dropped
import io
import itertools

df = pd.read_csv(io.StringIO("""  Cell Name   Site  Cell Count
0    04301A  04301           3
1    04301B  04301           3
2    04301C  04301           3
3    04302A  04302           3
4    04302B  04302           3
5    04302C  04302           3
6    04303A  04303           2
7    04303B  04303           2
8    04304A  04304           3"""),sep="\s\s+",engine="python",)

df.groupby("Site", as_index=False).agg(
    CellNeighbor=(
        "Cell Name",
        lambda s: ["-".join(c) for c in itertools.permutations(s, 2)],
    )
).explode("CellNeighbor").dropna().assign(
    RepeatedCells=lambda d: d["CellNeighbor"].str.split("-").str[0]
)

output

Site CellNeighbor RepeatedCells
0 4301 04301A-04301B 04301A
0 4301 04301A-04301C 04301A
0 4301 04301B-04301A 04301B
0 4301 04301B-04301C 04301B
0 4301 04301C-04301A 04301C
0 4301 04301C-04301B 04301C
1 4302 04302A-04302B 04302A
1 4302 04302A-04302C 04302A
1 4302 04302B-04302A 04302B
1 4302 04302B-04302C 04302B
1 4302 04302C-04302A 04302C
1 4302 04302C-04302B 04302C
2 4303 04303A-04303B 04303A
2 4303 04303B-04303A 04303B

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM